yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	9183d21f25	renamed new index class to old name git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c4e922885a	replaced indexURLEntry by new class that uses a kelondroRow.Entry object to store the index entry. This is another step to move to the new database structure. A side effect of this change is, that index storage uses much less RAM space, which affects the index RAM cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	58df8b7bbf	a large collection of different changes * mainly for the transition to the new indexing database structure * a bugfix for an endless loop inside kelondroTree iteration * a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice * very strong speed enhancement for url/domain extraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	8ba8e2b7d9	*) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	53cbcc6d6e	Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b20496e42b	*) make DHT DoS check configurable (requested by KoH) - check can be disabled via property indexDistribution.dhtReceiptLimitEnabled - upper bound can be configured via indexDistribution.dhtReceiptLimit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5041d330ce	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a930be4ba3	refactoring of index management: generalized the index entry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7dd57a3828	added a busy-time estimation at DHT/RWI-Receive to be done: usage of this value on client-side git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2116 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	82b2bc6932	patch for index-transfer DoS problem see http://www.yacy-forum.de/viewtopic.php?p=21627#21627 note that this function will make the index-transfer functionality void git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f188611fc6	apply blacklist on rwis during dht receive very experimental! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1865 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d31a4e0b4f	some small enhancements with cache flushing parameters and data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1767 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7eb10675b3	re-organization of index management this was done to be prepared for new storage algorithms git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8194fde340	*) trying to continue transferRWI processing even if this error occures: \|> Caused by: de.anomic.kelondro.kelondroException: kelondroTree.searchproc: nullpointernull in db '.../urlHash.db' - if URL existence can not be determined, we request it from the remote peer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@997 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7256bea45f	) Bugfix for nameLookup parameter handling ) Bugfix for Received xx Words [xxxxxxx .. null] Bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@953 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	40777556c5	) Connection Tracking - adding automatic refresh - accepts new parameter nameLookup which can be used to deactivate yacy-peer name lookup (because we have problems with this on large seed-dbs) ) ViewFile New page that can be used to view - original content - plain text content - parsed content - parsed sentences of a webpage specified by there url hash Mainly for debugging purpose at the moment ) Robots.txt Bugfix for if-modified-since usage TODO: synchronization of downloads to avoid loading the same robots-file multiple times in parallel by different threads ) Shutdown Better abortion of transferRWI and transferURL sessions on server shutdown *) Status Page Adding icon to start/stop crawling via status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	839db8869c	added high/low priority for index adding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a1777788a5	small change git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@879 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a9c466ef21	cleaned, finals, StringBuffer, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@849 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0c3a20d44f	more + changed log for better understanding of outOfMemory bug and others git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c42a543bc3	*) Adding peername to logmessage when receiving URLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@781 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c47bb1182d	bugfix for assortment initialization error git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@547 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	25f632dbd9	more DHT bugfixes and better logging of DHT effects git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@542 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	cd10370992	several bugfixes and dht selection / logging improvement git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@531 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
rramthun	b99205e445	Translation, spelling... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@448 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	a1ffc27041	preparations for image/movie/music indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@280 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	e26ac60c3e	modified assortment data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@148 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
rramthun	85c2f3be8a	Fixed spelling mistakes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@110 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	b4030e5023	implemented serverSwitchActions - action-hooks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@105 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	1d7fed87dc	redesign of index caching - removed indexCache.db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@86 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
theli	e7f7aa0bb9	) Import statements reorganized Now it's easier to determine which class really uses which other class) Reogranizing Import Statements git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@83 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	b9203bdb50	bug fixes and code cleaning git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@22 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	248077d3f0	initial load with yacy 0.36 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

34 Commits (4ff742e42de448da5449b36776ca91bf74635da7)