yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	4c9be29a55	fix concurrency issue with htmlParser using not current scraper data resulting in incorrect data for some html index metadata. Details see http://mantis.tokeek.de/view.php?id=717	8 years ago
reger	eedee6eabb	fix exception on URIMetadataNote instantiation with corrected id hash on host_id_s. Use Solr setField instead of addField to prevent java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at net.yacy.kelondro.data.meta.URIMetadataNode.hosthash(URIMetadataNode.java:247) at net.yacy.search.query.SearchEvent.addNodes(SearchEvent.java:966) at net.yacy.peers.Protocol.solrQuery(Protocol.java:1242) at net.yacy.peers.RemoteSearch$2.run(RemoteSearch.java:349)	8 years ago
luccioman	c1401d821e	Adjusted crawl depth control for FTP crawl start URLs.	8 years ago
reger	68d4dc5cc5	Complete harmonization RequestHeader getCookie with std ServletRequest to use javax.servlet.http.Cookie parameters. Depreciate now obsolete getHeaderCookies. Adjust setting of MaxAge to spec if >= 0 otherwise keep default.	8 years ago
reger	a1e5f7dbca	fix of fulltext.remove() by id of webgraph document webgraph has document hash in source_id_s	8 years ago
luccioman	1df558a6c6	Fixed YaCy proper shutdown triggered by SIGTERM signal. The main shutdown hook thread was not properly waiting for the main thread termination which consequently could not properly close resources and threads. After terminating a running YaCy peer this way (Ctrl+C in console, or kill <pid> for example), you could see the still existing DATA/yacy.running file. Tested with : - Debian Jessie openjdk 7 and 8 : regular shutdown, Ctrl+C, kill command, system restart while yacy is running - Windows 10 Oracle JDK 7 and 8 : non regression on regular shutdown	8 years ago
reger	b522d540b9	Include itemprop latitude/longitude (see schema.org) in attribute parsing for lat/lon. Harmonize number parsing for lat/lon to parseDouble. Fix endDate_dts value assignment.	8 years ago
reger	083df255e4	fix html tag attribute parsing containing attribute w/o value e.g. itemscope or autofocus (in such case the next key was not properly recognized).	8 years ago
reger	cb95b7339a	include html5 <time> tag in content scraper, add "datetime" property of <time> tag to scrapers startdate list. Datetime is parsed as iso8601 (xml) date, html5 allows partial as well as duration (not handled by this)	8 years ago
reger	7bf2bcf504	fix and prevent exception on missing required cookie name skip cookie creation if name is empty.	8 years ago
luccioman	3ca695390c	FTP crawl start URLs : applied crawl profile depth control Applied rules : - when the FTP URL denotes a file resource, stack it as any start URL : eventually embedded links can be followed applying the usual depth rules - when the FTP URL denotes a directory, list files under this directory and stack them for crawl, and repeat the process on sub folders until crawl depth is reached	8 years ago
luccioman	128c8ef8d4	Fixed title rendering having non ASCII chars in QuickCrawlLink_p.html.	8 years ago
reger	8eb6fba59c	activate filetype navigator plugin and restrict config (append) of navs to not already actives. Dht results are now included in count this might over shoot on redundant dht and solr, while the previous solr facet based was always low.	8 years ago
luccioman	c25e48e969	Enabled displaying results after 14th page for local search queries. Fixes issue #90 for local queries only: Stealth mode, Portal mode or Intranet mode. For P2p mode, the issue would probably be difficult to solve with reasonable performance. This is still to dig. Also switched some InterreputedException catch log messages to warn level as this is normal behavior when shutting down a peer. Fixed yacysearch buttons navbar behavior to deal correctly with total results count or offset over 1000. Also improved the buttons navbar to be able to navigate over 10th page for local queries.	8 years ago
luccioman	a3886c6adb	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
luccioman	feaa87005e	Improved indentation for easier debugging steps.	8 years ago
reger	bab4804d11	add FileTypeNavigator plugin	8 years ago
reger	d35c47090c	remove obsolete put of HttpServletRequest attributes to YaCy servlet parameters on SSI (server side includes). Query parameters are already merged by dispatcher.include, making copy of parameter (RequestDispatcher.INCLUDE_QUERY_STRING) obsolete. All other parameter are not used as YaCy servlet arguments.	8 years ago
reger	0959038624	correct DefaultServlet resource pathinContext calculation exclude servletPath option as resources are always relative to htroot or htdocs, the change reflects this. Theoretically it and the recent adjustments arcording relative urls allows to configure the instance to be configurable in a path other as root (/)	8 years ago
reger	c50e23c495	reduce creation of empty legacy RequestHeader() in situation where null is acceptable (less for garbage collection).	8 years ago
reger	87f6631a2a	adjust Cache getHeader to prev. changes/commit	8 years ago
reger	6be7339b1d	remove the overhead of unused reverseMappingCache of HeaderFramewor / RequestHeader	8 years ago
reger	c702eb6786	del dead menu link to /repository (directory not created in current distribution -> old)	8 years ago
reger	baa5d9b9e3	adjust DomainHandler working on resolved .yacy domain (remove obsolete check for path on hostname)	8 years ago
luccioman	1ba705c23d	Use loaderDispatcher instead of HTTPClient to download releases. The default redirection strategy when using directly HTTPClient is incorrect when redirection is cross host (the original Host header is still sent when requesting the redirected location). YaCy LoaderDispatcher handles redirections properly, thus release archive files using redirected URLs (such as the URLs on a GitHub Release page) are successfully downloaded.	8 years ago
luccioman	467650c042	Hardened system update checks. When a downloaded archive release is corrupted, empty, or can not be opened for any reason, the update script must not be launched because it erases the existing lib/*.jar libraries.	8 years ago
luccioman	b5711b8fe1	Added some Javadocs.	8 years ago
reger	0d2964cf2b	expanded error message on rejected crawl url due to faile dns lookup close of http://mantis.tokeek.de/view.php?id=678	8 years ago
luccioman	00e81fcc15	Check HTTP status when downloading a release, and report eventual error.	8 years ago
reger	0758c868c9	add HostNavigator plugin	8 years ago
reger	60160877f5	bundle initialization of search navigation plugins in separate handler class to allow to use navigator map in config servlets (without need to create a search event)	8 years ago
reger	3151cda3a5	catch ip-format exception on wrong server access setting ip filter as reported in http://mantis.tokeek.de/view.php?id=713 to prevent abort of initialization. This jetty/whitelist ipaccesshandler accepts currently only ipv4	8 years ago
reger	b32bcdf344	list entries in outgoing cookie monitor one per line for easier readability. For this adjust outgoingCookies entry to use Cookie[] instead of String[]	8 years ago
reger	3f32262654	enable getCookies for HeaderFramework reusing Jetty CookieCutter	8 years ago
reger	4186ee6fc0	add other custom response header entries set by servlets to the response to the client (not cookies only). This is used by some servlets to mainly set "Access-Control-Allow-Origin" header. Added a contains check to be sure no header set by Defaultservlet is overwritten.	8 years ago
luccioman	d27adc2b92	Fixed language detector initialization and NullPointerException cases. NullPointerException occurred when using and Identificator instance which encountered and error in its constructor. This error could be caused by a missing "langdetect" folder in the current folder of the main process, or by simultaneous first calls to the constructor, initializing concurrently the DetectorFactory.langlist. Fixes the mantis 714 (http://mantis.tokeek.de/view.php?id=714)	8 years ago
luccioman	a1f922b34a	Reduced locations vocabulary memory footprint. Reduced this vocabulary memory usage : - by using only one map term2entries instead of two maps having the same key set - by generating the location object links on the fly using the GeoLocation data instead of storing many duplicates of string prefix "http://www.openstreetmap.org/?lat=" Measurements with VisualVM and GeoNames 0 enabled (cities with a population > 1000) : - AutotaggingLibrary retained size : - initial : 309 718 763 bytes - after refactoring : 159 224 641 bytes	8 years ago
reger	9c06e752e4	allow request.setAttribute w/o "not implemented" exception by default skip unused CONNECTION_PROP_ARGS check in getQueryString	8 years ago
reger	59ab42e7d6	add UserDB lastaccess update calls on login	8 years ago
luccioman	bf8a6d9848	Reduced GeoNames locations memory footprint. Using String instead of StringBuilder instances in GeonamesLocation allows to reuse the same immutable objects in the Tagging class. Measurements with VisualVM and GeoNames 0 enabled (cities with a population > 1000) : - OverArchingLocation retained size : - initial : 164 666 830 bytes - after refactoring : 97 736 804 bytes - AutotaggingLibrary retained size : - initial : 354 713 633 bytes - after refactoring : 309 718 763 bytes	8 years ago
luccioman	3f561c1635	Fixed a NullPointerException case. Could occur when a search request was performed just after peer startup, and the Switchboard Thread "LibraryProvider.initialize" had completed, thus requesting a ProbabilisticClassifier not completely initialized (and having a null contexts property).	8 years ago
luccioman	6bc2bf1aa4	Small memory footprint reduction for GeonamesLocation. Reusing the same geonameid Integer instance between `id2loc` and `name2ids` maps reduces (a little) memory footprint. Measured OverarchigLocation class retained memory with VisualVM on openJDK 8 : - initial : 183 439 490 bytes - after refactoring : 164 666 830 bytes	8 years ago
luccioman	7f846ef674	Small complementary memory footprint improvement for synonyms. Memory footprint measured with VisualVM and all synonyms enabled : - before : 195 015 914 bytes - after : 192 548 826 bytes	8 years ago
luccioman	568e3dde6a	Improved synonyms memory footprint. The idea is to avoid unnecessary String objects duplication for the same words. Particularly efficient with the large moby thesaurus. Memory footprint measurements with VisualVM : - openthesaurus_de_yacy : - initial : 19 443 796 bytes - after refactoring : 18 012 606 bytes - mobythesaurus_en_yacy : - initial : 343 453 904 bytes - after refactoring : 173 843 780 bytes - thesaurus_ru_yacy : - initial : 3 800 706 bytes - after refactoring : 3 466 612 bytes - de + en + ru : - initial : 366 603 450 bytes - after refactoring : 195 015 914 bytes	8 years ago
reger	60b3adfb43	fix ext2mime to return given default on input=null	8 years ago
reger	f7e9f9be5f	move Digest auth checks from DefaultServlet to adminAuthenticated, eliminating the need to modify http header on Servlet container handled Digest authentication, to simulate Basic auth for YaCy servlets.	8 years ago
luccioman	cca3417b87	Fixed image and favicon viewing for unauthenticated local requests. As reported by @reger24, image and favicon viewing was broken with unauthenticated requests on peers configured to require authentication even from localhost. So I unified viewing rights check in a single new function on ImageViewer class.	8 years ago
reger	02092de3d8	remove login cookie generation for static admin ind User servlet cookieAuth is never successful for static admin, leaving the creation and handling for login cookies for static admin obsolete.	8 years ago
luccioman	fc575fc760	Fixed a NullPointerException case.	8 years ago
reger	9a8691129f	fix typing error from commit `60ba5c117c`	8 years ago

1 2 3 4 5 ...

8286 Commits (52d05d14c6c834b46ac6bf8d3729b04ab4f12eaa)