Commit Graph

174 Commits (fe917deb2d7668d59496053ff08f90d78a2b888b)

Author SHA1 Message Date
Michael Peter Christen 6d3d4c4ea6 changed the concurrent enumeration of query results in such a way that
11 years ago
Michael Peter Christen 81f9b34da7 increaesed ability ot search for all images on a single server within
11 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
11 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
11 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing:
11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
orbiter c250fac9f4 linkstructure refactoring to get more options for clickdepth analysis
11 years ago
Michael Peter Christen bd886054cb new structure and enhancements for link graph computation:
11 years ago
Michael Peter Christen 63c9fcf3e0 free configuration of postprocessing clickdepth maximum depth and time
11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document
11 years ago
Michael Peter Christen fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
11 years ago
Michael Peter Christen 7640834b37 removed double concurrency to put Solr documents into the index. The
11 years ago
Michael Peter Christen 0f6b72f24b do not use luke requests for remote solr servers if the result is
11 years ago
orbiter f6e441dd77 refactoring
11 years ago
Michael Peter Christen 69391e5d9e changed strategy to test existence of documents in Solr: using the
11 years ago
Michael Peter Christen bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
11 years ago
Michael Peter Christen 0cabcbbe83 more efficient wordcount
11 years ago
Michael Peter Christen 3d474a843e added memory protection for postprocessing
11 years ago
orbiter c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
11 years ago
reger 9b24dae2b7 add language navigation filter clause to rwi results
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago
reger 280c4a3ac1 exclude terms with " for didYouMean suggestion
11 years ago
orbiter 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
11 years ago
Michael Peter Christen 78eac85161 better calibration of caches and queue maximum sizes
11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing
11 years ago
Michael Peter Christen c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over
11 years ago
Michael Peter Christen 9cf9727685 fix for wrong counter
11 years ago
Michael Peter Christen fceac8cffd more monitoring for postprocessing
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
11 years ago
Michael Peter Christen e1c1e57877 less overhead calling exist() with only one hash
11 years ago
Michael Peter Christen 434e13b46d in host browser also show the properties of failed documents including
11 years ago
Michael Peter Christen 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also
11 years ago
Michael Peter Christen d328cc4a83 fix for didyoumean, added also more asian alphabets
11 years ago
Michael Peter Christen 101a6e6e14 Patch the citation index for links with canonical tags.
11 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the
12 years ago
Michael Peter Christen 96ed0c980e - added hosthash to all documents (also fail documents which is needed
12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
12 years ago
Michael Peter Christen 85456f46b2 added two new fields, exact_signature_copycount_i and
12 years ago
Michael Peter Christen a88a62f7aa added a feature to set a collection for a crawl result based on a
12 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
12 years ago
Michael Peter Christen 47b1c81d08 - refactoring
12 years ago