Commit Graph

165 Commits (54bea96e6797f01f28b597ca8ca7cd5c54f8997a)

Author SHA1 Message Date
Michael Peter Christen 5b94a257ce no timeout for large reference collections 11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing: 11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter 11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information 11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues: 11 years ago
orbiter c250fac9f4 linkstructure refactoring to get more options for clickdepth analysis 11 years ago
Michael Peter Christen bd886054cb new structure and enhancements for link graph computation: 11 years ago
Michael Peter Christen 63c9fcf3e0 free configuration of postprocessing clickdepth maximum depth and time 11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document 11 years ago
Michael Peter Christen fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps 11 years ago
Michael Peter Christen 7640834b37 removed double concurrency to put Solr documents into the index. The 11 years ago
Michael Peter Christen 0f6b72f24b do not use luke requests for remote solr servers if the result is 11 years ago
orbiter f6e441dd77 refactoring 11 years ago
Michael Peter Christen 69391e5d9e changed strategy to test existence of documents in Solr: using the 11 years ago
Michael Peter Christen bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata 11 years ago
Michael Peter Christen 0cabcbbe83 more efficient wordcount 11 years ago
Michael Peter Christen 3d474a843e added memory protection for postprocessing 11 years ago
orbiter c40ba51ca6 added new suggest method which replaces more-than-one suggestions: 11 years ago
reger 9b24dae2b7 add language navigation filter clause to rwi results 11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code 11 years ago
reger 280c4a3ac1 exclude terms with " for didYouMean suggestion 11 years ago
orbiter 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used 11 years ago
Michael Peter Christen 78eac85161 better calibration of caches and queue maximum sizes 11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing 11 years ago
Michael Peter Christen c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over 11 years ago
Michael Peter Christen 9cf9727685 fix for wrong counter 11 years ago
Michael Peter Christen fceac8cffd more monitoring for postprocessing 11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing 11 years ago
Michael Peter Christen 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The 12 years ago
Michael Peter Christen e1c1e57877 less overhead calling exist() with only one hash 12 years ago
Michael Peter Christen 434e13b46d in host browser also show the properties of failed documents including 12 years ago
Michael Peter Christen 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also 12 years ago
Michael Peter Christen d328cc4a83 fix for didyoumean, added also more asian alphabets 12 years ago
Michael Peter Christen 101a6e6e14 Patch the citation index for links with canonical tags. 12 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the 12 years ago
Michael Peter Christen 96ed0c980e - added hosthash to all documents (also fail documents which is needed 12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not 12 years ago
Michael Peter Christen 85456f46b2 added two new fields, exact_signature_copycount_i and 12 years ago
Michael Peter Christen a88a62f7aa added a feature to set a collection for a crawl result based on a 12 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user 12 years ago
Michael Peter Christen 47b1c81d08 - refactoring 12 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value 12 years ago
Michael Peter Christen c3b2301b2f fix for http://bugs.yacy.net/view.php?id=268 12 years ago
orbiter 056b42f5aa - added information about segment count to status_p.xml 12 years ago
orbiter 6fb2811e68 fixes for problems with remote solr and non-activated webgraph index 12 years ago
orbiter c124037f19 removed forced non-soft commits to prevent index fragmentation 12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler 12 years ago
Michael Peter Christen bcc623a843 refactoring of load_delay: this is a matter of client identification 12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog: 12 years ago
Michael Peter Christen 203921006a redesign of citation index storage 12 years ago