yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	4b649b0a11	Fixed NPE case and API URL link on Solr HTML output for webgraph core.	8 years ago
luccioman	af28a07780	Updated API calls recording/replay with recent changes. - enabled HTTP POST calls with Digest HTTP authentication - made API calls compatible with API newly restricted to HTTP POST only with transaction token validation - ensured backward compatibility with older entries recorded as HTTP GET	8 years ago
reger	81670c3484	One more use of SwitchboardConstants.SERVER_PORT constant, apply standard servlet design pattern initialization of solrselectservlet	8 years ago
luccioman	cde237b687	Enforced access controls on some administrative actions. - ensure use of HTTP POST method : HTTP GET should only be used for information retrieval and not to perform server side effect operations (see HTTP standard https://tools.ietf.org/html/rfc7231#section-4.2.1) - a transaction token is now required for these administrative form submissions to ensure the request can not be included in an external site and performed silently/by mistake by the user browser	8 years ago
luccioman	df5970df6d	Extended Apache HTTP Digest Auth. for use of YaCy encoded password When programmatically requesting the local peer with Apache http client, authentication credentials must be passed as clear-text values. This extension to the apache org.apache.http.impl.auth.DigestScheme permits use of the YaCy encoded password stored in the adminAccountBase64MD5 configuration property.	8 years ago
reger	f05976c017	Display the local search word statistic in alphabetic order	8 years ago
reger	3dd23c178b	Introduce the option to configure a shutdown port. A port value of -1 will disable this option. If set to a value greater 0, YaCy listens on this of on the local loopback address (127.0.0.1) for a shutdown or restart signal. E.g. connect to http://localhost:8005/shutdown will stop the YaCy server. http://localhost:8005/restart will restart it. This option allows to stop YaCy locally independant from the web web frontend (which might be configured for password protected remote access).	8 years ago
reger	a2afb4bae0	add switchboardconstants for server ports config keys	8 years ago
reger	56d0a87a83	remove double occuance of geo:lat in rss tokens	8 years ago
reger	b4fa1141b8	implement RequestHeader getRequestURI, getRequestURL for legacy request	8 years ago
reger	209a7374bd	remove unused import pdfParser	8 years ago
reger	de1c1c16db	Improve pdf text extraction resource handling. For sort pdf <= 3 pages use already extracted content, only for long pdf > 3 pages reassign content and close internal writer (to direct free buffers)	8 years ago
reger	9b6d1abd9e	eliminate some compiler unchecked and deprecation warnings in nav plugins by explicite type declaration and replacing date.getYear with Calendar.get	8 years ago
reger	18c7563dbe	Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages by using icu.ULocale for languages not already covered (ICU normalizes to ISO639-1 2 char codes). Add test class Use DublinCore vocabulary declarations in DCEntry and SurrogateReader for easier usage debugging, Init SurrogateReader.inputSource on first use.	8 years ago
reger	ce87025462	further avoid to set connect info properties as header value following comment "use of properties as header values is discouraged" in case where (proxy)HTTPClient overwrites values with supplied url. Use defined request.referer procedure in response class.	8 years ago
reger	cd4d891ea4	use pre-defined "Connection" header key, replace depreceated	8 years ago
luccioman	0173b0bc32	Added an advanced settings page for referrer policy settings. Feedback will be welcome, notably on the descriptive content of this page.	8 years ago
reger	81963a89fe	fix proxyservlet response url to respect http scheme if a relative Location header is returned.	8 years ago
luccioman	cdcd923375	Privacy enhancement : added settings to control referrer policy. HTTP "Referer" header sent by the browser when using YaCy can now be controlled either with the referrer meta tag as a global policy, or only for search result links by adding the attribute rel="noreferrer". To improve privacy with the less possible regressions, the default is set as meta tag with value "origin-when-cross-origin" : internal YaCy links behavior is not affected, but when visiting external websites referrer url is not empty but stripped from query parameters and path. Older browsers, Safari, MS IE and Edge do not support the referrer meta tag, so the standard but less flexible noreferrer link type can also be enabled as an alternative. User-friendly settings page to be implemented.	8 years ago
reger	86534a56f7	fixed ReindexSolrBusyThread new and unexpected repeat of same query with low number of found documents - by adding additional end condition to remove processed query with number of found docs <= process-chunck-size. Noticed on query h4_txt:[* TO *], found 21, process 21, call of commit happend but on next cycle same query again 21 docs found (while h4_txt was removed from schema and committed inputdocuments).	8 years ago
reger	275c0cddd1	Adjust DefaultServlet test case to recent change, depreciate unused CONNECTION_PROP_PROTOCOL (also as it might be misleading with getProtocol vs getScheme)	8 years ago
reger	41e2ee0eca	Fix call parameter for ConnectionInfo in MonitorHandler (expected scheme e.g. http, was protocol version). Depreceate obsolete custom X-...-Scheme header constant. Use existing FORMAT_ANSIC Dateformatter in HeaderFramework. Correct htmlParserTest (del one not intended println)	8 years ago
luccioman	ac766327d3	Switched a few more Solr fields from strictly mandatory to optional	8 years ago
reger	f254fcfc67	fix htmlParser <script> text extraction on code containing expression recognized as tag like 1<a reported in https://github.com/yacy/yacy_search_server/issues/109 Script content is ignored by default, but the text is filtered for html tags. Modified scraper to skip tag filtering while within a <script> section (until a closing tag is detected </script>. Possible side effect, missing </script> end-tag will truncate trailing content text.	8 years ago
luccioman	2f191e0e1c	Improved MultiprocotolURL non ASCII characters support. After @sinkuu Pull Request #108 added JUnit tests, updated some JavaDoc and also improved URL tokenization to support non ASCII characters.	8 years ago
luccioman	18e8b3a220	Merge branch 'escape' of https://github.com/sinkuu/yacy_search_server	8 years ago
reger	7419989de3	Correct dublincore title property text to lowercase in htmlresponsewriter, remove unused (carry over) local variable Do the same for other responsewriter.	8 years ago
Burkhard	4fdc11cae8	Update SearchEvent.java Fix NPE on disabled local SolrIndex, occuring on search moving to the 2nd result page. The debug purpose only setting to disabeling local SolrIndex (System Admin -> Debug Settings) should long term probably be removed from production code.	8 years ago
luccioman	cdc7f3e431	Switched some Solr fields from mandatory to optional These fields are default enabled but with no doubt not strictly mandatory with the current code base. As reported by @reger24, splitting between essential mandatory and optional fields is still to be improved to reflect the current YaCy needs.	8 years ago
luccioman	3475d8c1a9	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	c68a8be2d9	Refactored and enforced Solr mandatory fields for proper operation - Added a new method to check activation of mandatory fields on Collection Configuration commit, consistently with checks previously performed in Switchboard startup and with mandatory fields in the default schema. - Reorganized default schema and CollectionConfiguration enumeration : moved no more mandatory fields in a specific section, and moved fields enabled at startup to the mandatory section. - Marked mandatory fields as required and with stronger font in the IndexSchema_p.html page	8 years ago
reger	334c70c37a	correct fromDate init value on missing param in api/timeline_p servlet revert test modification from last commit in AccessTracker.main	8 years ago
reger	cc770512d5	add hint of query syntax in AccessTracker log (qs=normal querystring, sq=solr-querystring) to allow to filter simple text queries for processing, remove toString for counter parameter use more predefined constants in solrservlet	8 years ago
luccioman	e5858bc8c8	Fixed a NullPointerException case possible on Index Export As reported by Palulukas in YaCy forum (http://forum.yacy-websuche.de/viewtopic.php?f=18&t=5944&sid=dcef5b899ab4aa9b40e3a3d158c13aed#p33454) the Index Export operation can fails, notably when the Solr index contains one or more documents with empty (despite required) "load_date_dt" field. This fixes the export failure when the situation finally occurs, but more should be done to harden verifications on minimum required fields.	8 years ago
reger	7e53860fc7	fix NPE in HTMLResponseWriter on missing document title	8 years ago
reger	5e8879beb7	Reduce self generated content for text_t (visible text index field) to avoid repeat of tokenized url as description, continuation of `7e09bff4a1` `1409cabe8b` Add some javadoc, and not needed remove of omitted fields in postprocessing.	8 years ago
luccioman	6e89d125f2	Added robots.txt support for heuristics federated search. As noticed by @reger24, abusive use of OpenSearch systems should be prevented, especially if allowing to parse and reuse HTML results. robots.txt file is now checked before requesting an external OpenSearch system to respect the host exclusions and eventual crawl-delay value. The check is also performed when trying to add a new OpenSearch URL template through the /ConfigHeuristics_p.html admin page.	8 years ago
sinkuu	a46b232bf1	Use java.net.URLDecoder	8 years ago
luccioman	bf16de29c1	Added support for HTML OpenSearch results. Many OpenSearch systems do not provide results as standard RSS/Atom feeds but only as HTML. This modification add some support for custom OpenSearch HTML results through the use of mapping files (as already done for federated Solr search) relying on CSS-like selectors to retrieve information from HTML content. An example mapping file is provided to map results from the www.npmjs.com OpenSearch URL.	8 years ago
luccioman	54405577aa	Replaced absolute redirection locations by relative ones when possible. This makes integration of YaCy behind a reverse proxy subfolder easier.	8 years ago
luccioman	1857651988	Added a new Debug/Analysis advanced settings subsection. As discussed in PR #93 with @JeremyRand and @reger24 this new advanced settings page includes: - a new setting to control remote Solr responses encoding - some existing debug settings which could not be set through the admin user interface	8 years ago
luccioman	526f2d6a8b	Fixed NPE case occurring when local solr index is disabled in search.	8 years ago
luccioman	def55ec166	Improved termination of timed out remote solr requests to peers. On timeout, closing remote Solr requests is proper than simply using Thread.interrupt() that is not effective in most cases. Closing does not ask commit on remote solr, but release http connections resources and is more likely to end those threads that can else wait indefinitely. Other related improvements included : - no more marking remote peer as not available when remote search is interrupted before timeout by the cleanup job. - added a short fine log level trace of failing remote solr requests	8 years ago
luccioman	08de58b6d3	Named a Thread without name for easier monitoring	8 years ago
luccioman	9a5a124bf2	Distinguished solr connectors thread names for easier monitoring.	8 years ago
reger	1f497ccad5	Add consistency check for related index fields upon load and save of index schema. To assemble the original link url for out-/inboundlinks, icons and pictures the _protocol_sxt and _urlstub_sxt is needed (due to the used data-reduced storage methode). Auto-enable _protocol_sxt if _urlstub_sxt is enabled. to be able to correctly assemble the original link url.	8 years ago
luccioman	68afe900d0	Added user-friendly controls over disk usage configuration settings. As mentioned in issue #103, control settings over YaCy disk usage already existed but lacked a user-friendly way to set them. I added it to the Performance_p.html administration page with a little refactoring on the "Resource Observer" fieldset for improved accessibility and HTML standards respect. Also added the possibility to enable/disable the autoregulation fonction from this page.	8 years ago
reger	95d2a28599	adjust the Field-Reindex Thread to verify and update the document id in case hash (ID) doesn't match document url (sku field).	8 years ago
luccioman	fc01b69eca	Fixed local image search pagination regression. As reported by @tglman on issue #90, when searching images on the local index only, pages next to the first were always empty. This was a regression from commit `c25e48e969`.	8 years ago
Michael Peter Christen	02d0b3172c	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	d4f45cf05e	added dc.date.modified and dc.date.created to date parser	8 years ago
reger	f9180fabc4	assure that RWI Index.Segment IODispatcher is not blocking on shudown waiting on a semaphore permit. see desc. http://mantis.tokeek.de/view.php?id=723	8 years ago
reger	e61ee180a7	Group all proxy settings on System Administration by adding settings of UrlProxyAccss page (moved from deleted AugmentedBrowsing_p), adjust submenu (remove Augmented Browsing) and translation files.	8 years ago
luccioman	39e081ef38	Fixed display of crawler pending URLs counts in HostBrowser.html page. As described in mantis 722 (http://mantis.tokeek.de/view.php?id=722) Also updated some Javadoc.	8 years ago
reger	df80c57842	add ukr and pol to DCEntry.getLanguage ISO639-2 3-char language code conversion to deliver uk, pl 2-char code and use if else to return on match	8 years ago
luccioman	e048e74072	Added an optional parameter to webstructure.xml api. This new "documentStructure" parameter can be set to false to only get hosts accumulated references on a resource and thus prevent scraping the specified URL and getting citations references. Also set WebStructureGraph constants as final and updated the Javadoc with example api call URLs.	8 years ago
reger	581b00cc20	remove obsolete lastmodified calculation in WebgraphConfig	8 years ago
luccioman	5c8958bcea	Updated Javadoc and Junit tests for the WebStructureGraph class.	8 years ago
luccioman	d9766ca981	Fixed WatchWebStructure_p.html render to include https URLs. As described in mantis 721 (http://mantis.tokeek.de/view.php?id=721) WatchWebStructure_p.html failed to include in its structure view https and other protocols and ports than default http.	8 years ago
luccioman	ed3dd5e31a	Fixed webstructure.xml API used with a domain name 'about' parameter. As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720), when requesting this API with a domain name instead of a complete URL only HTTP references on default port were listed.	8 years ago
luccioman	0da1e6ba16	Factored code re-implementing DigestURL.hosthash() method. This ensure consistent implementation of the url host hash generation and easier usage finding in source code. Also added a unit test for this function.	8 years ago
luccioman	86adfef30f	Added automated unit tests and perfs test for WebStructureGraph class. Fixed references count when multiple links target the same domain name in one document.	8 years ago
luccioman	9cea7cbb10	Detailed some Javadoc related to /api/webstructure.xml usage.	8 years ago
luccioman	6a4d51d8f9	Cleaned up some Javadoc warnings.	8 years ago
luccioman	86dc198698	Fixed some JavaDocs broken links.	8 years ago
reger	16beb551ea	fix DC.Elements namespace in DublinCore vocabulary class delete redundant (unused) DCElements.	8 years ago
luccioman	339f005ced	Blacklist import and update performance improvements. Measurement sample : import from blacklist local file containing about 15000 entries - before refactoring : several minutes - after refactoring : a few seconds!	8 years ago
luccioman	e3892b0957	Added some JavaDoc.	8 years ago
reger	4c9be29a55	fix concurrency issue with htmlParser using not current scraper data resulting in incorrect data for some html index metadata. Details see http://mantis.tokeek.de/view.php?id=717	8 years ago
reger	eedee6eabb	fix exception on URIMetadataNote instantiation with corrected id hash on host_id_s. Use Solr setField instead of addField to prevent java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at net.yacy.kelondro.data.meta.URIMetadataNode.hosthash(URIMetadataNode.java:247) at net.yacy.search.query.SearchEvent.addNodes(SearchEvent.java:966) at net.yacy.peers.Protocol.solrQuery(Protocol.java:1242) at net.yacy.peers.RemoteSearch$2.run(RemoteSearch.java:349)	8 years ago
luccioman	c1401d821e	Adjusted crawl depth control for FTP crawl start URLs.	8 years ago
reger	68d4dc5cc5	Complete harmonization RequestHeader getCookie with std ServletRequest to use javax.servlet.http.Cookie parameters. Depreciate now obsolete getHeaderCookies. Adjust setting of MaxAge to spec if >= 0 otherwise keep default.	8 years ago
reger	a1e5f7dbca	fix of fulltext.remove() by id of webgraph document webgraph has document hash in source_id_s	8 years ago
luccioman	1df558a6c6	Fixed YaCy proper shutdown triggered by SIGTERM signal. The main shutdown hook thread was not properly waiting for the main thread termination which consequently could not properly close resources and threads. After terminating a running YaCy peer this way (Ctrl+C in console, or kill <pid> for example), you could see the still existing DATA/yacy.running file. Tested with : - Debian Jessie openjdk 7 and 8 : regular shutdown, Ctrl+C, kill command, system restart while yacy is running - Windows 10 Oracle JDK 7 and 8 : non regression on regular shutdown	8 years ago
reger	b522d540b9	Include itemprop latitude/longitude (see schema.org) in attribute parsing for lat/lon. Harmonize number parsing for lat/lon to parseDouble. Fix endDate_dts value assignment.	8 years ago
reger	083df255e4	fix html tag attribute parsing containing attribute w/o value e.g. itemscope or autofocus (in such case the next key was not properly recognized).	8 years ago
reger	cb95b7339a	include html5 <time> tag in content scraper, add "datetime" property of <time> tag to scrapers startdate list. Datetime is parsed as iso8601 (xml) date, html5 allows partial as well as duration (not handled by this)	8 years ago
reger	7bf2bcf504	fix and prevent exception on missing required cookie name skip cookie creation if name is empty.	8 years ago
luccioman	3ca695390c	FTP crawl start URLs : applied crawl profile depth control Applied rules : - when the FTP URL denotes a file resource, stack it as any start URL : eventually embedded links can be followed applying the usual depth rules - when the FTP URL denotes a directory, list files under this directory and stack them for crawl, and repeat the process on sub folders until crawl depth is reached	8 years ago
luccioman	128c8ef8d4	Fixed title rendering having non ASCII chars in QuickCrawlLink_p.html.	8 years ago
reger	8eb6fba59c	activate filetype navigator plugin and restrict config (append) of navs to not already actives. Dht results are now included in count this might over shoot on redundant dht and solr, while the previous solr facet based was always low.	8 years ago
luccioman	c25e48e969	Enabled displaying results after 14th page for local search queries. Fixes issue #90 for local queries only: Stealth mode, Portal mode or Intranet mode. For P2p mode, the issue would probably be difficult to solve with reasonable performance. This is still to dig. Also switched some InterreputedException catch log messages to warn level as this is normal behavior when shutting down a peer. Fixed yacysearch buttons navbar behavior to deal correctly with total results count or offset over 1000. Also improved the buttons navbar to be able to navigate over 10th page for local queries.	8 years ago
luccioman	a3886c6adb	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	feaa87005e	Improved indentation for easier debugging steps.	8 years ago
reger	bab4804d11	add FileTypeNavigator plugin	8 years ago
reger	d35c47090c	remove obsolete put of HttpServletRequest attributes to YaCy servlet parameters on SSI (server side includes). Query parameters are already merged by dispatcher.include, making copy of parameter (RequestDispatcher.INCLUDE_QUERY_STRING) obsolete. All other parameter are not used as YaCy servlet arguments.	8 years ago
reger	0959038624	correct DefaultServlet resource pathinContext calculation exclude servletPath option as resources are always relative to htroot or htdocs, the change reflects this. Theoretically it and the recent adjustments arcording relative urls allows to configure the instance to be configurable in a path other as root (/)	8 years ago
reger	c50e23c495	reduce creation of empty legacy RequestHeader() in situation where null is acceptable (less for garbage collection).	8 years ago
reger	87f6631a2a	adjust Cache getHeader to prev. changes/commit	8 years ago
reger	6be7339b1d	remove the overhead of unused reverseMappingCache of HeaderFramewor / RequestHeader	8 years ago
reger	c702eb6786	del dead menu link to /repository (directory not created in current distribution -> old)	8 years ago
reger	baa5d9b9e3	adjust DomainHandler working on resolved .yacy domain (remove obsolete check for path on hostname)	8 years ago
luccioman	1ba705c23d	Use loaderDispatcher instead of HTTPClient to download releases. The default redirection strategy when using directly HTTPClient is incorrect when redirection is cross host (the original Host header is still sent when requesting the redirected location). YaCy LoaderDispatcher handles redirections properly, thus release archive files using redirected URLs (such as the URLs on a GitHub Release page) are successfully downloaded.	8 years ago
luccioman	467650c042	Hardened system update checks. When a downloaded archive release is corrupted, empty, or can not be opened for any reason, the update script must not be launched because it erases the existing lib/*.jar libraries.	8 years ago
luccioman	b5711b8fe1	Added some Javadocs.	8 years ago
reger	0d2964cf2b	expanded error message on rejected crawl url due to faile dns lookup close of http://mantis.tokeek.de/view.php?id=678	8 years ago
luccioman	00e81fcc15	Check HTTP status when downloading a release, and report eventual error.	8 years ago
reger	0758c868c9	add HostNavigator plugin	8 years ago
reger	60160877f5	bundle initialization of search navigation plugins in separate handler class to allow to use navigator map in config servlets (without need to create a search event)	8 years ago
reger	3151cda3a5	catch ip-format exception on wrong server access setting ip filter as reported in http://mantis.tokeek.de/view.php?id=713 to prevent abort of initialization. This jetty/whitelist ipaccesshandler accepts currently only ipv4	8 years ago
reger	b32bcdf344	list entries in outgoing cookie monitor one per line for easier readability. For this adjust outgoingCookies entry to use Cookie[] instead of String[]	8 years ago
reger	3f32262654	enable getCookies for HeaderFramework reusing Jetty CookieCutter	8 years ago
reger	4186ee6fc0	add other custom response header entries set by servlets to the response to the client (not cookies only). This is used by some servlets to mainly set "Access-Control-Allow-Origin" header. Added a contains check to be sure no header set by Defaultservlet is overwritten.	8 years ago
luccioman	d27adc2b92	Fixed language detector initialization and NullPointerException cases. NullPointerException occurred when using and Identificator instance which encountered and error in its constructor. This error could be caused by a missing "langdetect" folder in the current folder of the main process, or by simultaneous first calls to the constructor, initializing concurrently the DetectorFactory.langlist. Fixes the mantis 714 (http://mantis.tokeek.de/view.php?id=714)	8 years ago
luccioman	a1f922b34a	Reduced locations vocabulary memory footprint. Reduced this vocabulary memory usage : - by using only one map term2entries instead of two maps having the same key set - by generating the location object links on the fly using the GeoLocation data instead of storing many duplicates of string prefix "http://www.openstreetmap.org/?lat=" Measurements with VisualVM and GeoNames 0 enabled (cities with a population > 1000) : - AutotaggingLibrary retained size : - initial : 309 718 763 bytes - after refactoring : 159 224 641 bytes	8 years ago
reger	9c06e752e4	allow request.setAttribute w/o "not implemented" exception by default skip unused CONNECTION_PROP_ARGS check in getQueryString	8 years ago
reger	59ab42e7d6	add UserDB lastaccess update calls on login	8 years ago
luccioman	bf8a6d9848	Reduced GeoNames locations memory footprint. Using String instead of StringBuilder instances in GeonamesLocation allows to reuse the same immutable objects in the Tagging class. Measurements with VisualVM and GeoNames 0 enabled (cities with a population > 1000) : - OverArchingLocation retained size : - initial : 164 666 830 bytes - after refactoring : 97 736 804 bytes - AutotaggingLibrary retained size : - initial : 354 713 633 bytes - after refactoring : 309 718 763 bytes	8 years ago
luccioman	3f561c1635	Fixed a NullPointerException case. Could occur when a search request was performed just after peer startup, and the Switchboard Thread "LibraryProvider.initialize" had completed, thus requesting a ProbabilisticClassifier not completely initialized (and having a null contexts property).	8 years ago
luccioman	6bc2bf1aa4	Small memory footprint reduction for GeonamesLocation. Reusing the same geonameid Integer instance between `id2loc` and `name2ids` maps reduces (a little) memory footprint. Measured OverarchigLocation class retained memory with VisualVM on openJDK 8 : - initial : 183 439 490 bytes - after refactoring : 164 666 830 bytes	8 years ago
luccioman	7f846ef674	Small complementary memory footprint improvement for synonyms. Memory footprint measured with VisualVM and all synonyms enabled : - before : 195 015 914 bytes - after : 192 548 826 bytes	8 years ago
luccioman	568e3dde6a	Improved synonyms memory footprint. The idea is to avoid unnecessary String objects duplication for the same words. Particularly efficient with the large moby thesaurus. Memory footprint measurements with VisualVM : - openthesaurus_de_yacy : - initial : 19 443 796 bytes - after refactoring : 18 012 606 bytes - mobythesaurus_en_yacy : - initial : 343 453 904 bytes - after refactoring : 173 843 780 bytes - thesaurus_ru_yacy : - initial : 3 800 706 bytes - after refactoring : 3 466 612 bytes - de + en + ru : - initial : 366 603 450 bytes - after refactoring : 195 015 914 bytes	8 years ago
reger	60b3adfb43	fix ext2mime to return given default on input=null	8 years ago
reger	f7e9f9be5f	move Digest auth checks from DefaultServlet to adminAuthenticated, eliminating the need to modify http header on Servlet container handled Digest authentication, to simulate Basic auth for YaCy servlets.	8 years ago
luccioman	cca3417b87	Fixed image and favicon viewing for unauthenticated local requests. As reported by @reger24, image and favicon viewing was broken with unauthenticated requests on peers configured to require authentication even from localhost. So I unified viewing rights check in a single new function on ImageViewer class.	8 years ago
reger	02092de3d8	remove login cookie generation for static admin ind User servlet cookieAuth is never successful for static admin, leaving the creation and handling for login cookies for static admin obsolete.	8 years ago
luccioman	fc575fc760	Fixed a NullPointerException case.	8 years ago
reger	9a8691129f	fix typing error from commit `60ba5c117c`	8 years ago
reger	f9328f07e2	completing the usage of CONNECTION_PROP_CLIENT_HTTPSERVLETREQUEST in HTTPDProxyHandlers logging facility.	8 years ago
reger	8e3e3ed191	update the older ResponseHeader patch to handle cookies, to work directly with javax.servlet.http.Cookie (rename headerProps to cookieStore as is only used for this). (Re)implement set-cookie in DefaultServlet to make cookieAuthentication work as designed.	8 years ago
reger	866d3a1960	make RequestHeader login succeed (without throwing exception by default) correct getAuthType to return Auth Scheme only after authentication	8 years ago
reger	44a6a4e795	fix authentication by hit in userdb (wrong parameter)	8 years ago
luccioman	aa9ddf3c23	Added control over Robots.txt active threads maximum number. When starting a crawl from a file containing thousands of links, configuration setting "crawler.MaxActiveThreads" is effective to prevent saturating the system with too many outgoing HTTP connections threads launched by the crawler. But robots.txt was not affected by this setting and was indefinitely increasing the number of concurrently loading threads until most ot the connections timed out. To improve performance control, added a pool of threads for Robots.txt, consistently used in its ensureExist() and massCrawlCheck() methods. The Robots.txt threads pool max size can now be configured in the /PerformanceQueus_p.html page, or with the new "robots.txt.MaxActiveThreads" setting, initialized with the same default value as the crawler.	8 years ago
luccioman	3092a8ced5	Fixed thread name consistency for improved monitoring. Some tasks were modifying the current thread name without restoring it once finished as it is effectively done elsewhere.	8 years ago
luccioman	eec5779889	Added a name prefix to pooled threads for easier monitoring. Using JVM monitoring tools, it is then easier to identify tasks running inside thread pool with a custom prefix rather than the generic one : "pool-".	8 years ago
reger	59130777a6	add high scored items first to YearNavigator (to make sure to be included in sorted view)	8 years ago
luccioman	0ba5a838f7	Added charset meta to Solr HTML writers. Non-ASCII characters are thus correctly rendered in browsers. This is a fix ro mantis 706 : http://mantis.tokeek.de/view.php?id=706	8 years ago
reger	08a0acc35d	make a YearNavigator availabel, useable as SearchEvent.naviator plugin. It can take any Date field of the index and displays a list of year strings in reverse order by the year (not the score/count). To allow to define the index field to use, the fieldname (and title can be appended to the navi's name "year" e.g. year:load_date_dt:LoadDate It works also with dates_in_content_dts field (from the graphical date navigator). Here the query parameter from: to: are used on selection as Query modifier (for other dates currently no query parameter available, so selection won't work to filter search results). Not included in the UI Searchpage layout config so far (for experiment with it manual change to conf needed).	8 years ago
reger	7742579ca4	make a LanguageNavigator availabel, useable for the SearchEvent.naviator plugin (not activated yet).	8 years ago
reger	0d3bef659b	implement RequestHeader.setCharacterEncoding for legacy header, make sure .getProtocol returns a http version move the patch for Set-Cookie to ResponseHeader (applies only here)	8 years ago
Michael Peter Christen	5320209963	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	83f5e3d715	added+disabled a federate search experiment	8 years ago
reger	4eeb448eb3	use DigestURL in UrlProxyServlet as parameter to pass requested url to handler. UrlProxyServlet splits url in parts to pass it on as parameter and HeaderFramework constructs a url from param parts. This is obsolete if already created url is used (makes HeaderFramework.getRequestURL obsolete = removed)	8 years ago
reger	bad8f87998	remove old/obsolete clear text "adminAccount" credential entry from init and setConfig (.,empty) from servlets/code	8 years ago
reger	811cf637f8	fix Jetty9YaCySecurityHandler, length check of Basic credential, add comment to SwitchboardConstants.AdminAccount const	8 years ago
reger	fdcf33f08f	fix Domain.stripToHostName for some IPv6 cases add unit test for it	8 years ago
reger	ac6e198bd1	add unit test for Domains.stripToPort, simplify ipv6 check	8 years ago
reger	f27531f5ec	fix Domains.stripToPort, make ipv6 save	8 years ago
reger	67744a8038	fix HeaderFramework.getRequestURL on host with port considering ipv6 host	8 years ago
reger	66cc0dd173	refactor: move GSA specific date formatter to GSAservlet adjust return type to String for HeaderFrameWork.getSingle	8 years ago
reger	d525967999	refactor: move convertHeaderFromJetty to ProxHandler (only used with active proxy not needed for standard servlets)	8 years ago
reger	60ba5c117c	fix legacy getHeaderCookies to work with cookies from original HttpServletRequest, by moving to RequestHeader.	8 years ago
reger	30f8d1e2d7	let RequestHeader.logout succeed w/o throwing exception by default	8 years ago
reger	28afd3a2f8	fix UserDB.proxyAuth from header string (take care of prefix "Basic" in header entry)	8 years ago
luccioman	0806de8fdc	Ensure file input stream are closed in both normal and error cases.	8 years ago
luccioman	a0dfbaca6a	FileUtils : added some JavaDocs and unit test cases	8 years ago
reger	59448461d3	make use of userInRole for quick login verification	8 years ago
reger	2a4d826d9e	adjust servlet RequestHeader.getLocale init jvm defaultLocale matching UI language	8 years ago
reger	9db68acb4f	remove obsolete X_YACY... header declarations not in use (no writes, only remove and try to read). Obsolete parameter setupHttpClient	8 years ago
reger	8e9aece786	more use of RequestHeader constant referer, authorization in Jetty9YaCySecurityHandler	8 years ago
reger	d631fbc019	make more use of the new ServletRequest interface methodes getScheme, getServerPort (in QuickCrawlLink_p & YaCyDefaultServlet)	8 years ago
reger	395f2e8946	Make ServletRequest implement the standardized HttpServletRequest interface, to make all readily available information from the original ServletRequest available to YaCy servlets (without converting data to internal structures). The implementation of the common interface allows easier integration of YaCy servlets with the servlet standard (e.g. shared login service with the servlet container etc.)	8 years ago
luccioman	74fec066f4	Converted more URLs to pure relative ones. Easier YaCy peer configuration behind a reverse proxy subfolder : no need for the reverse proxy to rewrite HTML links or URLs in css files. Tested on Debian Jessie with an apache2 reverse proxy. See related mantis issues http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	0f0393e5e3	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	7296e3884f	Switched even more URLs to pure relative ones. Thus a YaCy peer can run behind a reverse proxy subfolder without need for the reverse proxy to rewrite HTML links (a CPU costly operation). Tested on Debian Jessie with an apache2 reverse proxy. See related mantis issues http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
reger	49eae79c01	fix Tables.hasIndex check for tablename = key apply same functionality to hasHeap (to not create new table on call hasHeap)	8 years ago
luccioman	84b81c1af0	Switched more URLs to relative ones when possible. This permits an easier and more flexible reverse proxy configuration. Some related mantis issues : http://mantis.tokeek.de/view.php?id=106 and http://mantis.tokeek.de/view.php?id=701	8 years ago
luccioman	731684105a	Improved absolute URLs rendering in OpenSearch desc and RSS feeds. When the peer is behind a reverse proxy providing SSL/TLS encryption, the rendered absolute URLs should start with https when the user browser requested https : added limited support to the X-Forwarded-Proto HTTP header notably provided on Heroku platform. Also added some unit tests.	8 years ago
reger	669f60223e	upd Column.toString to output encoder "{bytes}" used for String and binary Column types	8 years ago
reger	c9e81d2fa0	fix Column parsing from celldefinition string, without cellwidth def. (outofbound exception)	8 years ago
reger	e0816ef2e5	use human readable date format in CrawlStacker error message "double in: local index, oldDate = "	8 years ago
luccioman	54d879a9b3	Generate HTML relative (to each peer) links from hosted WikiCode. When WikiCode inserted in a peer hosted Blog, Wiki, Messages or Profile contains relative links (images or any content, hosted in DATA/HTDOCS), it is more reliable to keep these links relative, especially when the peer is behind any kind of reverse Proxy.	8 years ago
luccioman	2da5f339f8	Fixed /News.html and /Wiki.html pages in Search Portal mode (issue #87 ). Also fixes theses pages rendering when the peer is not online. Re-factored code in common with /opensearchdescription.xml and ConfigPortal.html.	8 years ago
reger	8fe28a83f2	harmonize used lastmodified date for rwi and fulltext in storeDocument	8 years ago
reger	3d1d297308	refactor namespace navigator as part of navigatorplugin map, this allows the navigator to include counts all matches (rwi+fulltext). Fixing also unresolved_pattern in navigators title (of the counter) The use of inurl: query modifier as filter has not been changed keeping it as soft (unsharp) filter facet. Upd StringNavigator to prevent empty string form multivalued solr fields, removed date value conversion (better handled elsewhere, not need here).	8 years ago
reger	67f660523b	Make navigators underlaying indexfield name accessible in interface use interface in declaration and extend facet check to include navigator field.	8 years ago
reger	5eb3ee4e20	Add search navigator interface to allow for additional navigators (plugins) Prepared the first basic navigators (for authors and collections) for the list of SearchEvent.navigatorPlugins and adjusted servlet to use these. - this allows to configure display order of these navigators (by ordering config string) - eventually allows for additional and/or custom navigators using any available index field without need for changing servlets - the Collection navigation has been adjusted to exclude the internal, default robot_* and dht collections from displaying - rwi results are now also checked for navigatior by the refactored navi's So far no config options were added to customize or add navigators (may come later if route of upcoming modularization/plugin system is defined).	8 years ago
reger	fd3f58fcaa	improve query modifier parsing of "collection:" and possible collision with "on:" in case multiple collection modifier were entered (by mistake) http://mantis.tokeek.de/view.php?id=702	8 years ago
reger	af39a76bf6	Reduce number of default max. search navigator lines (from 10000) to 100 + make it configurable	8 years ago
reger	20a1b29ed3	add simple test case for ReferenceContainer helpful for debugging calculated ranking parameter	8 years ago
reger	3c7220bc7b	Refacture rwi reference word position and word distance calculation used for rwi ranking. Main changes: - introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access) - use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null - adjust assignments and the min() max() and distance() calculation accordingly	8 years ago
luccioman	f0639d810c	Customized name for Threads still using the default "Thread-n" pattern. This makes threads monitoring easier to read.	8 years ago
luccioman	db3b9db9c2	Crawl from local file : faster task end when manually terminating crawl.	8 years ago
reger	4c67ed3f8d	catch rwi ranking div by zero exception during rwi search result processing worddistance calculation is effected by concurrent update (normalization) of min/max ranking parameter for wordpositions. On update of min/max the exception is raised in distance calc and now catched. This concurrent update and change of ranking results is needed for speed but should be further checked for optimization	8 years ago
luccioman	47af33a04c	Advanced Crawl from local file : better processing of large files. Applied strategy : when there is no restriction on domains or sub-path(s), stack anchor links once discovered by the content scraper instead of waiting the complete parsing of the file. This makes it possible to handle a crawling start file with thousands of links in a reasonable amount of time. Performance limitation : even if the crawl start faster with a large file, the content of the parsed file still is fully loaded in memory.	8 years ago
luccioman	ee92082a3b	Updated javadocs : warning about closing stream responsibility.	8 years ago
luccioman	6f49ece22f	Fixed redirected URLs processing as crawl start point. See mantis 699 (http://mantis.tokeek.de/view.php?id=699) for details.	8 years ago
reger	68217465fe	div by null in word distance calculation (again, description in http://mantis.tokeek.de/view.php?id=698) as root cause was not seen, added just workaround reducing in favour over a try catch (for easier followup).	8 years ago
luccioman	7263d17436	Removed mentions of deprecated LURL-db. Thanks to LA_FORGE asking about if on YaCy forum ( http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5895 )	8 years ago
reger	8b74a6bf57	fix min/max calculation of WordReferenceVars.distance() Issue was the calculation in AbstractReference with positions.clear() call, this made distance result always 0 (distance needs min 2 positions) and created concurrency issues. + unit test of changes	8 years ago
luccioman	da362628fb	Added fine log level for too long blacklist matching processing.	8 years ago
reger	aaae7c6462	adjust ConcurrentScoreMap internal value map to interface and use parameter Long -> Integer (saves some bytes)	8 years ago
reger	31d2a5645e	remove obsolete query variable leftover from `8fb370d9f8 (diff-1d4259005ebfddc11083387857a86175)` harmonize ranking shift parameter to 0xFF correct addresult weight parameter to long	8 years ago
luccioman	a588ed7628	Applied image headers customization to the new ViewFavicon servlet.	8 years ago
luccioman	7717a3d43d	Fixed license headers on files created to improve favicon management.	8 years ago
luccioman	6e1959f469	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git Conflicts: htroot/yacysearchitem.java source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java source/net/yacy/search/schema/CollectionConfiguration.java source/net/yacy/server/serverObjects.java	8 years ago
reger	685d8e86bf	Avoid frequent data type casting (float/long) for rwi score refactor to using long in URIMetadataNode too (and related call parameters) As remote rwi score's are not used (since v1.83) skip reading float-score , but keep in toString() for communication with older versions.	8 years ago
luccioman	3ccd89e274	Fixed MultiProtocolURL.resolveBackpath to handle remaining '..' segments	8 years ago
luccioman	4b699c469a	Blacklist refactoring : extracted a function for easier unit testing	8 years ago
luccioman	54cfcc3f56	CrawlCheck_p.html : also display info about disallowed URLs.	8 years ago
luccioman	8b341e9818	Robots : properly handle URLs including non ASCII characters This fixes GitHub issue 80 ( https://github.com/yacy/yacy_search_server/issues/80 ) reported by Lord-Protector.	8 years ago
reger	e68b00678e	prevent negative score on URIMetadataNode - in the special case were no solr score is supplied. + assert before use & test case	8 years ago
luccioman	242707f9b4	Fixed loadFromCache with strategy IFFRESH. This fixes mantis 695 ( http://mantis.tokeek.de/view.php?id=695 ) : crawl start with 'Link-List of URL' option on websites using cookies.	8 years ago
reger	b752bcfecb	adjust date in text detection to ignore some program version strings like "3.1.2.0102" see http://mantis.tokeek.de/view.php?id=650 + expand test case	8 years ago
reger	b017e97421	optimize condenser language detection a little. langdetect probabilities take letter case into account, add words from description and anchors etc. as is. + add it to javadoc	8 years ago
reger	ae3717d087	adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! ) + remove unused sentenceword map (we use only the count) + upd test case for sentence count	8 years ago
reger	474f0476c6	adjust Tokenizer sentence count on trailing text after last recognized sentence + upd test case for rwi multi-word-query (leaving results known to fail untested)	8 years ago
reger	3861ac9293	upd maven dependency-check plugin to reflect changes of https://nvd.nist.gov + upd unknown ant script with current lib/jsch version	8 years ago
reger	681a61dafb	adjust rwi index result word position handling used for rwi ranking - correct WordReferenceVars.toRowEntry posintext parameter to set expected min posintext (the difference is on multi-word queries, while positions are ordered by search word order). - modified posofphrase/posinphrase join operation - to set min posofphrase - and keep posinphrase if not same posofphrase (was set to 0, no differentiation during ranking) + fix compiler msg (missing type declaration)	8 years ago
reger	14f7577231	add support for older Word versions (Word6/Word95) to docParser	8 years ago
reger	1a79c64495	generalize DateDetection with holiday date rules readily available in icu to make sure current dates are recognized (was fixed to 2014 - 2016) + adjust holiday date parser from pattern.match to pattern.find to deal with leading and trailing text + moved relative date recognition (morgen, tomorrow) to parseline (used by query parser only), as not working and problematic for indexing + add test case for parseline (used by query parser)	8 years ago
reger	6f68f08354	correct DateDetection Silvester date add Thanksgiving	8 years ago
reger	32a2e3a22a	have RSSFeed.getChannel return empty message on missing channel element, a) required b) prevent NPE in rss servlets + add test	8 years ago
luccioman	8d57b5b970	Added some javadocs.	8 years ago
luccioman	60df09fff9	Fixed some HTML validation errors : Illegal character in query Now encode space characters in URLs query part.	8 years ago
reger	862f28eaa6	display number of documents/rss-items for label "docs" in load_rss_p servlet (as replacement for the rarely used "docs" rss-tag for a url to the rss-specification)	8 years ago
luccioman	dcdea2d02f	Fixed shutdown for crawler.MaxActiveThreads value greater than 200 Shutdown was hanging in CrawlQueues.close() at this.workerQueue.put(POISON_REQUEST) when config value crawler.MaxActiveThreads was greater than 200. Revealed by "Collision" Threads dumps in mantis 689 (http://mantis.tokeek.de/view.php?id=689#c1312) Fixed consistency between this.worker.length and this.workerQueue capacity, and made the process more reliable using non-blocking offer() function.	8 years ago
luccioman	d286ba2c3e	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	b8f6458152	Prevent yacy main thread from hanging on browser opening process. First fix for mantis 689 (http://mantis.tokeek.de/view.php?id=689). On Debian Linux, with a headless jre and no open browser, browser.openBrowserClassic() was called and waited forever the browser process end (p.waitFor()). YaCy shutdown was therefore not working until the browser was closed. Also modified browser opening command for Unix platform to open the default the browser (with xdg-open util) instead of Firefox. xdg-open also has the advantage to be asynchronous (not blocking).	8 years ago
reger	70e1eb30a5	prevent StringIndexOutOfBounds in getLocalFile() + tighten patching of DOS path w/o protocol to drive "LETTER":	8 years ago
luccioman	1bb0b135ac	Avoid duplication of various MS Windows file URLs flavors Fix for mantis 692 (http://mantis.tokeek.de/view.php?id=692)	8 years ago
luccioman	b9a8476f02	Removed unused import	8 years ago
reger	e73c1eea8c	remove unused rootpattern, leftover from commit `9a5ab4e2c1 (diff-d2b184283abed53ae260fc9eabdaef40)`	8 years ago
reger	6f8c3ccea4	improve url hash computation for file path with mixed java & windows file.separator to compute equal hashes (by normalizing path for computation) + expand test case for to check mixed java / windows file url notation like e.g. file:///c:/test/file.html vs. file:///c:\test/file.html - relates partially to http://mantis.tokeek.de/view.php?id=692	8 years ago
reger	efcb6a1e74	fix supported mime XML -> xml for rssParser (mime normalized to lower case for comparison) + add mime text/xml as in use for rss in the wild	8 years ago
luccioman	b3b75b0498	Accessibility : add a customizable alternative text to YaCy log Applied W3C recommendations : https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image and https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems	8 years ago
luccioman	f2bc1b268d	Updated URL fragment validation rules according to current standards See RFC 3986 (https://tools.ietf.org/html/rfc3986) or URL living standard (https://url.spec.whatwg.org/)	8 years ago
luccioman	b1b8e69da8	Fixed NullPointerException cases	8 years ago
luccioman	3ee4f56c39	Improved ErrorCache behavior when switching networks Even after network switch, ErroCache was still holding a reference to the previous Solr cores, thus becoming useless until next YaCy restart. Initial error cache filling with recent errors from the index was also missing after the swtich.	8 years ago
luccioman	7d5ba2afa4	Added some JavaDoc and moved crawlStacker close at the right place.	8 years ago
luccioman	8edbcd8ad4	Log eventual Solr instances close errors. We do not want to block on this kind of error, but this should not silently fail as it may have later consequences.	8 years ago
reger	330768c8a2	fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686 The embedded core holds a lock on the index and must be closed. Earlier commit comment states that core should be closed with solr instance instead on close of connector. Adjusted the InstanceMirror.close() to take care of closing the embedded instance to release the lock. In 2 routines of fulltext this was already explicite implemented (disconnectLocalSolr). Now this disconnect is part of the InstanceMirror.close().	8 years ago
reger	585d2a6441	test case: for NewsPool to check the id modificator (for unique id) and observe the distribution order .. hands on. + add test/DATA to gitignor	8 years ago
luccioman	de5c873e38	Removed unused JavaScript file docs.min.js This file is used by Bootstrap documentation website (http://getbootstrap.com/) but is not part of the Bootstrap distribution and has not be included in a Bootstrap based application.	8 years ago
Michael Peter Christen	df51e4ef07	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	e063aaf97f	enable fuzzy search, solr style (append a ~ to get a fuzzyness on the word)	8 years ago
reger	ff6589fc0f	test case: simulating multi word query for local rwi index Purpose of the test case is to be able to (controlled) analyse the rwi ranking for multi word searches (with focus on posintext and word-distance ranking)	8 years ago
reger	e990297d2e	avoid NPE on hello message with missing "yourip" key http://mantis.tokeek.de/view.php?id=684	8 years ago
reger	e51ab8c7aa	hack to generate a unique message-id for messages created in the same second by optionally add a 1 second offset counter to the current time (which is used as the unique id part)	8 years ago
Michael Peter Christen	b82300358a	removed version number check because it does not work any more if version numbers are expressed in a different way as we expect. That could cause that YaCy does not run on systems which are appropriate but we simply do not understand the version string.	8 years ago
Michael Peter Christen	2107674999	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	0d28f563f4	fix for java version "9-ea"	8 years ago
reger	3b694b3935	add some javadoc to rwi wordreference distance, position to remember facts for http://mantis.tokeek.de/view.php?id=683 Init missing word position to 0 like in other non text body words	8 years ago
reger	a4465c97d6	as requested, disable/remove old swf parser http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5861#p33098	8 years ago
reger	7f63fc50f3	prepare a IndexSegment test case for RWI index testing + prevent NPE in Segment.clear() on missing embedded solr instance.	8 years ago
reger	96467c5467	remove not needed counter in Tokeninzer (completing last changes) including a small change, word posintext counting. We remember/store 1st posintext. Previously following words got a handle (posintext) excluding found. Now it just counts and assigns true posintext as handle (posintext)	8 years ago
luccioman	d66b0f7b7b	Fixed french messages encoding in YaCy tray. Also added the missing french translations.	8 years ago
reger	7efb66ee10	adjust the WordReference.join wordsintext calc to take the max (instead of sum) The reference is for the same url (add same for title and phrases). + del redundant join() procedure	8 years ago
luccioman	0a9ff14d96	Fixed NullPointerException case and added Javadoc	8 years ago
luccioman	06d4f93d03	Merged master into postprocessing branch	8 years ago
Michael Peter Christen	b73d2db914	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	25a3c7a6d0	catch exception and write end of object	8 years ago
reger	272cdd496a	reactivate sentence counter in WordTokenizer for phrasepos ranking, by counting punktuation (delivered as 1 char word) again.	8 years ago
Michael Peter Christen	5e165a8150	removed unused imports	8 years ago
Michael Peter Christen	c716648c78	enhanced json encoding of strings	8 years ago
Michael Peter Christen	6139bd85a8	fix for broken facet names	8 years ago
Michael Peter Christen	5060f9fee9	fix for too long snippets	8 years ago
Michael Peter Christen	8681cee3f3	fix for bad comma	8 years ago
Michael Peter Christen	db6d8fc197	fix for bad json	8 years ago
Michael Peter Christen	8f4a341735	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	9934f546bb	added default fl to solr query, removed large texts retrieval and changed snippet to description tag if no other description is available	8 years ago
reger	120bf7e6e2	implemented RWI WordReference to return the word position value (was always left empty) This is needed and enables existing word position ranking for RWI. The upcoming concurrency issue in word position min/max calculation were eliminated by iterator.hasHext check before next() access.	8 years ago
reger	e310ec5f70	fix posInText ranking calculation to score 0 on no position info + fix Word posInText calc in Tokenizer to start with 1 + test case	8 years ago
luccioman	74f9927ddc	Merge remote-tracking branch 'origin/master' into dist_macOS	8 years ago
reger	51c077f493	adjust the getTopics() and getTopicNavigator() to current useage - move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics) - let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)	8 years ago
reger	39dd244693	fix ConcurrentScoreMap.set() calculation of totalCount() + test case	8 years ago
reger	ebf818ad95	log a error on aborted news publish (due to duplicate news.id) + change printed err msg to log entry in PeerAction.processPeerArrival	8 years ago
reger	cc2d9dd3f1	reactivate the use of included-in-topwords boost in postRanking + changed the postRanking to add one score only if word appears more as one time. + getTopics() unused code block rem'd (save performace)-> routine needs rework !	8 years ago
luccioman	39ea28adfd	Merged master to dist_macOS branch.	8 years ago
luccioman	8255e91c99	Fixed serverClassLoader.findClass method htroot is a supposed to be a subfolder of appPath and not of dataPath, as assumed in other places where htroot is loaded. This issue was not visible when dataPath and appPath are equals.	8 years ago
reger	6801673a07	apply postranking media search boost only on media queries	8 years ago
luccioman	1dc4306058	Fixed indentation for better readability.	8 years ago
luccioman	8c49a755da	Postprocessing refactoring Added Javadocs to refactored methods. Added log warnings instead of silently failing some errors. Only fill collection1hosts when required ( shallComputeCR true).	8 years ago
luccioman	42f45760ed	Refactored postprocessing For easier understanding and performances profiling.	8 years ago
reger	4386e84b55	correct NewPool rentention calculation (was still clearing everything after one day)	8 years ago
reger	5e72d37f0a	TransNews_p: add ad-hoc translation of target file on positive vote (additon to local translation) + errmsg on language=default	8 years ago
reger	9462a32244	Added news service for easy, community driven UI translation support. New or modified translation (via /Translator_p.html) can be shared/distributed via the YaCy internal news service. Remote peers can see and vote on the translation via the new http://localhost:8090/TransNews_p.html servlet. A positive vote will add the received translation to the local translation list and post a voting message to the news service. (at this no processing of received votings is implemented) + fixed the msg service retention time check (NewsPool.automaticProcessP)	8 years ago
reger	f8d6543a23	Rename class CreateTranslationMaster to TranslationManager and add additional routines and the capability to handle translation maps internally (to reduce complexity of handling translation maps for calling servelets)	8 years ago
reger	19b4509d54	speed-up reading of xlif language file, by using xmlparser (stax) instead of jaxb making xliff-core-1.2-1.1.jar obsolete	8 years ago
Michael Peter Christen	e1fac86f53	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	a9316ceff6	force browser-caching of favicons from search results	8 years ago
Orbiter	503312ca43	Merge pull request #61 from luccioman/heroku_experiments Deploy YaCy on Heroku	8 years ago
reger	33bf35d90f	missing file for prev commint "Introduction of additional language setting browser"	8 years ago
reger	16e8ed3f01	Introduce additional language setting "browser/Browser Language" for UI internationalization. If language is set to "browser" the client/user browser language is used to choose from available translation. simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French. ! To make a translation/language available you have to activate the language once ! (or manually use the utility class TranslateAll) In ConfigBasic.html availabel translations are marked green on setting language=Browser The client language is determined by http header Accept-Language (checked in DefaultServlet)	8 years ago
reger	3b47a07dd1	change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to use directly HttpServletRequest. This is used to get the http protocol version in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client. - adjust YaCyProxyServlet and UrlProxyServlet accordingly - use more http_version constants in headerframework and httpdeamon - equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST	8 years ago
reger	036c1dc6ef	fix CookieTest_p formatting (output of <br> as text), change to dataoutput only by servlet, leave formatting to html. + removed link to obsolete env/grafics gif	8 years ago
Michael Peter Christen	bf6709d196	fixed missing browser activation in linux	8 years ago
Michael Peter Christen	d8504418b6	enhanced browser-caching of static content	8 years ago
Michael Peter Christen	079112358c	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	efeb592661	don't do solr optimization, this create high IO load. We should leave this task to solr to do that on it's own instead of forcing it.	8 years ago
luccioman	46b8836548	Copy image resources contained in donation iframe. Handle eventual images loading errors.	8 years ago
reger	4c7a77662a	eleminate dependency on file-extension in storeDocument but use supported mime-type to also support handling of urls w/o corresponding file-extension. For this refactor use of document.getParserObject() to alway return a Parser (for clean logic) and define/move the scraperObject as local var of AbstractParser. Adjust related calls to getParserObject (where actually a scraperObject is wanted). Addionally skip appending url token to parsed text for dht metadata entries (by default returned as result by rwi index).	8 years ago
reger	ebde21079a	refactor xlsParser to include Excel file attribute (like author) in parser result doc. Similar to ppt and doc parser, completing a TODO in xlsParser.	8 years ago
luccioman	744c9a2615	Opensearch desc : handle https protocol url with default port (443) This completes modifications made for mantis 669 (http://mantis.tokeek.de/view.php?id=669)	8 years ago
luccioman	b9c28893ee	Merged master to 'heroku' branch.	8 years ago
Michael Peter Christen	103a8348b3	fix for NPE and small performance enhancement	8 years ago
reger	2910fe35c1	add missing scheduler calc of next exec_date (call of calculateAPIScheduler) - after last_exec_date is altered, next_exec_date should be recalculated - makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)	8 years ago
reger	70d47ae38a	keep scheduler selection by repeat entry from `07311020d4` to allow exec schedule on actual exec event. Iterate on exec date (of advantage after interruption/shutdown) to schedule older or missed events first.	8 years ago
reger	7c3f932e5d	revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)	8 years ago
reger	07311020d4	postpone apicall exec date init until actual call fix for http://mantis.tokeek.de/view.php?id=677 The difference is on scheduling a large number of rss feeds and loading is not finished before shutdown of YaCy. The change makes sure not already loaded RSS will be loaded by the scheduler on next startup.	8 years ago
reger	5e335b32da	fix Blacklist.contains() matching path pattern to string similar to `5e9e871192` + add proof testcase	8 years ago
reger	5e9e871192	fix Blacklist.remove by using pattern.toString to find pattern to remove, parameter String path did never equal Pattern. + delete unused removeAll, as it does not persist changes after restart	8 years ago
reger	1843ea7e69	on Blacklist.add pattern to source file also update internal entry maps as in Blacklist.add(blacklistType) to make entry effective w/o restart fix for http://mantis.tokeek.de/view.php?id=676	8 years ago
reger	bf6ce33da3	Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable + add some javadoc and remove a not useful static declaration	8 years ago
luccioman	480027ec98	Merge remote-tracking branch 'origin/master' into heroku_experiments	8 years ago
reger	fcad2d0744	add uses of config constant INDEX_RECEIVE_ALLOW	8 years ago
reger	226f81cfcf	declare poison pill url MultiProtocolURL() as protected to make sure not used from outside. After double checking use of poison url revert path init from commit `f8632ad292`	8 years ago
reger	f8632ad292	prevent string index out of bounds MultiProtocolURL.getPaths as path maybe a empty string + init path to "" also in init for poison url (to guarantee success for all existing uses of path w/o check for null)	8 years ago
reger	35a7d57260	update lucenematchversion to current (5.2.0 -> 5.5.0) there should be no need for reindex by the update	8 years ago
reger	9b07bbf955	deprecate newurl(), not used and already replaced instead of making it handle all supported the protocols	8 years ago

... 4 5 6 7 8 ...

8604 Commits (7496df93c38ee63c032bf6791c65623faf4e76f8)