Commit Graph

415 Commits (bf8a6d984855f3aaca35b866a1e27d988e933f21)

Author SHA1 Message Date
Michael Peter Christen 3d474a843e added memory protection for postprocessing
11 years ago
Michael Peter Christen 9228214f9b enrichment of PerformanceMemory display of SolrInfoMBean table
11 years ago
Michael Peter Christen e8bdf16ea7 added statistic information for solr resources in PerformanceMemory
11 years ago
Michael Peter Christen 456e52e0d5 enhanced strategy to clear solr caches
11 years ago
orbiter c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
11 years ago
reger 9b24dae2b7 add language navigation filter clause to rwi results
11 years ago
Michael Peter Christen c84bcc878a first try to add a generic solr servlet as luke request servlet
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago
Michael Peter Christen f8ce7040ab remote search peer selection schema change:
11 years ago
reger 280c4a3ac1 exclude terms with " for didYouMean suggestion
11 years ago
reger 6932aa4d7a use configured admin-username for api calls
11 years ago
orbiter 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
11 years ago
orbiter 3cb6c7861f fixed shutdown authenticaton problem
11 years ago
Michael Peter Christen ee17bd0b69 added option to attach remote solr servers in read-only mode
11 years ago
Michael Peter Christen 2f16770681 migrated to solr 4.6.0
11 years ago
Michael Peter Christen 2702d9e56b - added a SolrQueryResponse2SolrDocumentList method which is able to
11 years ago
Michael Peter Christen 552ef9f18e fix for bad ErrorCache.exists test (bug from latest commit)
11 years ago
Michael Peter Christen 303f5694ba avoid usage of existsByQuery. If a document can be loaded by the ID
11 years ago
Michael Peter Christen 78eac85161 better calibration of caches and queue maximum sizes
11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing
11 years ago
Michael Peter Christen c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over
11 years ago
Michael Peter Christen 9932c441c8 fixed a problem with Date fields parsing Solr results if a remote Solr
11 years ago
orbiter 037cd0a57c using the BinaryResponseWriter which is supported within the YaCy solr
11 years ago
Michael Peter Christen 9cf9727685 fix for wrong counter
11 years ago
Michael Peter Christen fceac8cffd more monitoring for postprocessing
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen 1a4a69c226 set more logger to 'final static'
11 years ago
Michael Peter Christen acc1f8a749 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
11 years ago
sixcooler 987f410011 URL-export:add query and fix for cast-class-exception
11 years ago
Michael Peter Christen e1c1e57877 less overhead calling exist() with only one hash
11 years ago
Michael Peter Christen 5a02d650ee avoid cloning
11 years ago
Michael Peter Christen cc39667399 Speed enhancements and less CPU usage during Solr searches when using
11 years ago
Michael Peter Christen 434e13b46d in host browser also show the properties of failed documents including
11 years ago
Michael Peter Christen 030d0776ff Enhanced crawl start for very, very large crawl lists (i.e. > 5000)
11 years ago
Michael Peter Christen 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also
11 years ago
Michael Peter Christen d328cc4a83 fix for didyoumean, added also more asian alphabets
11 years ago
Michael Peter Christen 21aa6a0321 migration to Solr 4.5.0
11 years ago
Michael Peter Christen 101a6e6e14 Patch the citation index for links with canonical tags.
11 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the
12 years ago
Michael Peter Christen 96ed0c980e - added hosthash to all documents (also fail documents which is needed
12 years ago
orbiter 828603e4f1 fix for 100%CPU problem in error cache cleaning process
12 years ago
orbiter f3be1930cb CPU problem when pusing to the error cache; wrong class,
12 years ago
Michael Peter Christen e40671ddb7 better and consistent deletions for error urls
12 years ago
Michael Peter Christen 2602be8d1e - removed ZURL data structure; removed also the ZURL data file
12 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
12 years ago
Michael Peter Christen 85456f46b2 added two new fields, exact_signature_copycount_i and
12 years ago
Michael Peter Christen 1a3e42eca4 index migration to lucene 4.4
12 years ago
Michael Peter Christen a88a62f7aa added a feature to set a collection for a crawl result based on a
12 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
12 years ago
Michael Peter Christen 47b1c81d08 - refactoring
12 years ago
reger 02fe8b43ba Field Re-Indexing: display list of fields in reindex queue
12 years ago
Michael Peter Christen 58fe986cca Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
12 years ago
reger f2d99053ed Field Re-Indexing: prevent endless error loop in ReindexSolrBusyThread on Solr exception (by skipping query causing the exception)
12 years ago
Michael Peter Christen c3b2301b2f fix for http://bugs.yacy.net/view.php?id=268
12 years ago
orbiter 056b42f5aa - added information about segment count to status_p.xml
12 years ago
orbiter 6fb2811e68 fixes for problems with remote solr and non-activated webgraph index
12 years ago
orbiter c124037f19 removed forced non-soft commits to prevent index fragmentation
12 years ago
Roland Haeder be0ff6018f Removed trailing spaces + some more final
12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
Michael Peter Christen 1fd006cc56 fixes using the embedded connector
12 years ago
orbiter 5533fc8e01 fix for bug 260
12 years ago
Michael Peter Christen bcc623a843 refactoring of load_delay: this is a matter of client identification
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 203921006a redesign of citation index storage
12 years ago
sixcooler e5abccdfe4 added optimize-option
12 years ago
Michael Peter Christen 570511f3c8 removed fields references_internal_id_sxt and
12 years ago
Michael Peter Christen ffc570f95f removed forced soft commit since this may be the cause for a performance
12 years ago
Michael Peter Christen f7e77a21bf Added a citation reference computation for intra-domain link structures.
12 years ago
Michael Peter Christen 9fc0c4df98 fix for bad exists 'enhancement'; see bug:
12 years ago
reger 8a7fcb391d enable use of solrcore.properties for property substitution of solrconfig.xml
12 years ago
Michael Peter Christen 164603b946 cleanup
12 years ago
Michael Peter Christen a1644ca0fd new workflow processor in Segment to enqueue indexing documents to solr
12 years ago
Michael Peter Christen 0c1a018bbd removed 'later' tactic because it used too much RAM, reduced number of
12 years ago
Michael Peter Christen 709e9b8ce7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 281959a2d7 added option to re-boot the embedded solr during run-time. Added also
12 years ago
orbiter da621e827e prevent NPE in case RWI is disabled
12 years ago
Michael Peter Christen 2b563debbf javadoc of new multiple-exist test
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen b68fbe7d21 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 8dbc80da70 redesign of index.exist-test: this shall now not be done using a single
12 years ago
reger 7f63d3747d more generic field selection for reindex option of documents with disabled fields
12 years ago
reger 79401cb938 added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html)
12 years ago
Michael Peter Christen b24d1d18e4 removed synchronization and concurrency in Fulltext class, concurrent
12 years ago
Michael Peter Christen ad050ec88d - upgraded httpclient, httpcore and httpmime
12 years ago
Michael Peter Christen e26bdd4a52 fixes to deletion methods (removed unnecessary concurrency and added
12 years ago
Michael Peter Christen f7f3e28c5e prevent that the size of the index is computed too many times.
12 years ago
Michael Peter Christen cca19d94d4 re-declared some fields to be of type string rather than text which
12 years ago
Michael Peter Christen 3841854c97 abstraction of catchall term
12 years ago
orbiter 7de5b9cfa0 fix for http://bugs.yacy.net/view.php?id=233
12 years ago
Michael Peter Christen f36a7da5f6 - re-introduced existById in solr connector.
12 years ago
Michael Peter Christen 3502b4c697 refactoring (renaming) of yacy-solr api
12 years ago
Michael Peter Christen 3a0fcfbeda Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter e1bfe9d07a - reduction of the concurrently running processes to make YaCy more
12 years ago
Michael Peter Christen c091000165 added collection attribute also to the rss feed reader
12 years ago
Michael Peter Christen d937c55204 extended limitation of dom export size from 100000 to 100000000
12 years ago
Michael Peter Christen 566d6c980c checking of document signature for a double-document check now refers
12 years ago
Michael Peter Christen 7ab5093321 added new solr title_exact_signature_l and
12 years ago
Michael Peter Christen f24ac518e6 redesign of exists()-query (can now be called with query) and the
12 years ago
Michael Peter Christen 27d6222880 added new field host_extent_i which, after a crawl and postprocessing,
12 years ago
reger 518b20147c skip postprocessing during document.store if no citation index connected (prevent null pointer exception)
12 years ago
Michael Peter Christen ada3f27de7 added three new field for a better ranking: references_internal_i,
12 years ago
Michael Peter Christen edc0b33f6d - showing references count and clickdepth in host browser
12 years ago
orbiter 940c6849ee enhanced did-you-mean (a bit): can now remember previously searched
12 years ago
Michael Peter Christen 6300730d7f refactoring of clickdepth computation as preparation for clickdepth
12 years ago
orbiter 47114910d5 fix for possible memory leaks
12 years ago
Michael Peter Christen 25300913fa fixes to search debugging after testing with the different search
12 years ago
Michael Peter Christen c2fde018b5 concurrent snippet fetching from solr results which do not have snippets
12 years ago
Michael Peter Christen 2b6c79d347 in method exists() also use the new caching-stacks for
12 years ago
Michael Peter Christen 0d7b4bc891 better protection against OOM during search flush and fixed missing
12 years ago
Michael Peter Christen 221ed7d764 - enhanced concurrency during search without IO blocking
12 years ago
Michael Peter Christen 3b1d9dc884 made index storage from DHT search result concurrently. This prevents
12 years ago
orbiter f13c0b2abd fix for search
12 years ago
orbiter 0f7ea7ad9f - enhanced solr.add procedure for mass adds
12 years ago
orbiter d74472f562 corrected result counter
12 years ago
Michael Peter Christen c95a84103a complete redesign of search process:
12 years ago
Michael Peter Christen 089dee1770 - generalized SchemaConfiguration into super-class Configuration and
12 years ago
Michael Peter Christen c16de49f64 fix for webgraph delete query
12 years ago
Michael Peter Christen 56d5946a59 - added flags in IndexFederated_p.html to switch on or off the webgraph
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
12 years ago
Michael Peter Christen b6de1f42dc Full redesign of solr connection architecture. This was done to support
12 years ago
Michael Peter Christen 4111606654 removed the commitWithin attribute because that is not the way how the
12 years ago
Michael Peter Christen 7806680ab8 fixed a problem with re-feeding of already indexed documents whith
12 years ago
Michael Peter Christen 4323621a76 update to Solr 4.1.0
12 years ago
Michael Peter Christen 7dfcc92b71 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 0b6566a389 optimizations when starting large crawl requests with many start urls in
12 years ago
orbiter a2160054d7 ability to create vocabularies also without any objectspace: this
12 years ago
orbiter ecc10a752c fixes to index enumeration for vocabulary production
12 years ago
Michael Peter Christen 0fe7b6fd3b migrated the index export methods from the old metadata to solr. Now
12 years ago
Michael Peter Christen 1768c82010 removed field selection because that created documents with that field
12 years ago
Michael Peter Christen 4735bd47f4 - changed solr commit call and added an optimize option. Since Solr
12 years ago
reger 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
12 years ago
reger 3b6e08b49f prevent checking of urldb if empty
12 years ago
Michael Peter Christen 38d3feae65 added separate delete commands for the local+remote solr index, the old
12 years ago
Michael Peter Christen 6f0baaa309 added the clickdepth post-processing: some links may have 'shortcuts' to
12 years ago
Michael Peter Christen 0f5b6f38c1 enhanced root-url detection
12 years ago
Michael Peter Christen 5c0c56cfe1 Preparations to produce a click depth attribute in the search index.
12 years ago
reger 4987caf1c9 - apply fix for localhost handling (from yacy2solr) also to metadata2solr
12 years ago
Michael Peter Christen 2a4c064c89 using the publisher information for the author field if no author is
12 years ago
Michael Peter Christen eac9650b31 added another solr field clickdepth_i which reflects the number of
12 years ago
Michael Peter Christen 1052263af3 - added a new solr field references_i which stores the number of
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen fb0fa9a102 - fixed 'delete from subpath' during crawl start which deleted nothing;
12 years ago
orbiter a4a780b871 - fix for bad url conversion in bookmarks when using smb urls
12 years ago
Michael Peter Christen 72f165d58b added a Boost class which stores solr query boost values. The class can
12 years ago
Michael Peter Christen 8fc3679c66 using more pre-compile pattern for split methods
12 years ago
Michael Peter Christen b7004043ea - added a field cache for solr queries which call only for a single
12 years ago