Commit Graph

153 Commits (18a56446cee9fa4a4b3c3fba6a74a23ee260fec8)

Author SHA1 Message Date
orbiter f6e441dd77 refactoring
11 years ago
Michael Peter Christen 69391e5d9e changed strategy to test existence of documents in Solr: using the
11 years ago
Michael Peter Christen bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
11 years ago
Michael Peter Christen 0cabcbbe83 more efficient wordcount
11 years ago
Michael Peter Christen 3d474a843e added memory protection for postprocessing
11 years ago
orbiter c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
11 years ago
reger 9b24dae2b7 add language navigation filter clause to rwi results
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago
reger 280c4a3ac1 exclude terms with " for didYouMean suggestion
11 years ago
orbiter 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
11 years ago
Michael Peter Christen 78eac85161 better calibration of caches and queue maximum sizes
11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing
11 years ago
Michael Peter Christen c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over
11 years ago
Michael Peter Christen 9cf9727685 fix for wrong counter
11 years ago
Michael Peter Christen fceac8cffd more monitoring for postprocessing
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
11 years ago
Michael Peter Christen e1c1e57877 less overhead calling exist() with only one hash
11 years ago
Michael Peter Christen 434e13b46d in host browser also show the properties of failed documents including
11 years ago
Michael Peter Christen 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also
11 years ago
Michael Peter Christen d328cc4a83 fix for didyoumean, added also more asian alphabets
11 years ago
Michael Peter Christen 101a6e6e14 Patch the citation index for links with canonical tags.
11 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the
12 years ago
Michael Peter Christen 96ed0c980e - added hosthash to all documents (also fail documents which is needed
12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
12 years ago
Michael Peter Christen 85456f46b2 added two new fields, exact_signature_copycount_i and
12 years ago
Michael Peter Christen a88a62f7aa added a feature to set a collection for a crawl result based on a
12 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
12 years ago
Michael Peter Christen 47b1c81d08 - refactoring
12 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
12 years ago
Michael Peter Christen c3b2301b2f fix for http://bugs.yacy.net/view.php?id=268
12 years ago
orbiter 056b42f5aa - added information about segment count to status_p.xml
12 years ago
orbiter 6fb2811e68 fixes for problems with remote solr and non-activated webgraph index
12 years ago
orbiter c124037f19 removed forced non-soft commits to prevent index fragmentation
12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
Michael Peter Christen bcc623a843 refactoring of load_delay: this is a matter of client identification
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 203921006a redesign of citation index storage
12 years ago
Michael Peter Christen 570511f3c8 removed fields references_internal_id_sxt and
12 years ago
Michael Peter Christen ffc570f95f removed forced soft commit since this may be the cause for a performance
12 years ago
Michael Peter Christen f7e77a21bf Added a citation reference computation for intra-domain link structures.
12 years ago
Michael Peter Christen a1644ca0fd new workflow processor in Segment to enqueue indexing documents to solr
12 years ago
Michael Peter Christen 0c1a018bbd removed 'later' tactic because it used too much RAM, reduced number of
12 years ago
orbiter da621e827e prevent NPE in case RWI is disabled
12 years ago
Michael Peter Christen 2b563debbf javadoc of new multiple-exist test
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen 8dbc80da70 redesign of index.exist-test: this shall now not be done using a single
12 years ago
Michael Peter Christen e26bdd4a52 fixes to deletion methods (removed unnecessary concurrency and added
12 years ago
Michael Peter Christen f7f3e28c5e prevent that the size of the index is computed too many times.
12 years ago
Michael Peter Christen 3841854c97 abstraction of catchall term
12 years ago