Commit Graph

378 Commits (1f9389396a87bd8d328b4e6917651ba8bb279b61)

Author SHA1 Message Date
Michael Peter Christen 932faafffe reactivated on-demand snapshot loading
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 6a1865f507 refactoring date -> lastModified
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
reger 5f0bb1214f modified FieldReIndex to reindex queries with low number of documents first
10 years ago
Michael Peter Christen 70f03f7c8e do not cache search requests to Solr if the result is used for
10 years ago
Michael Peter Christen 6a2a669db4 added loading of the synonyms file from addon/synonyms into the
10 years ago
Michael Peter Christen 0a879c98e7 added new 'firstSeen' database table and necessary data structures which
10 years ago
sixcooler 9c6e3a6b1c fix assertation-failure in version-string for Solr-4.10.2 by changing
10 years ago
sixcooler 725b206fb4 update to solr-/lucene-4.10.2
10 years ago
orbiter 0fcd8097a3 removed unused options from BusyThreads
10 years ago
Michael Peter Christen 77662e08e1 concurrently initialize the error cache; extended also the cache by
10 years ago
orbiter a922b122a3 added a hack to forward solr search results from an external attached
11 years ago
Michael Peter Christen 6d3d4c4ea6 changed the concurrent enumeration of query results in such a way that
11 years ago
Michael Peter Christen 81f9b34da7 increaesed ability ot search for all images on a single server within
11 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
11 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
11 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
11 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
11 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
11 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
Michael Peter Christen 09dcdb9b19 update to solr 4.9.0
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing:
11 years ago
Michael Peter Christen f0db501630 better handling of ranking parameters and new default values for date
11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter
11 years ago
Michael Peter Christen 6634b5b737 debug code for index distribution testing
11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared
11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
orbiter 95780eed32 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 6bd8c6f195 fix for wrong status codes of error pages
11 years ago
orbiter c250fac9f4 linkstructure refactoring to get more options for clickdepth analysis
11 years ago
Michael Peter Christen bd886054cb new structure and enhancements for link graph computation:
11 years ago
Michael Peter Christen ebd44a7080 replaced solr 4.6.1 with solr 4.7.1 and added index migration to
11 years ago
Michael Peter Christen 926d28dd3f fixed a bug which prevented crawl starts after a network switch
11 years ago
reger 227c42bc96 eleminate obsolete URIMetaDataRow class
11 years ago
Michael Peter Christen 63c9fcf3e0 free configuration of postprocessing clickdepth maximum depth and time
11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document
11 years ago
Michael Peter Christen fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
11 years ago
Michael Peter Christen 7c1b968378 another fix for the shutdown exceptions
11 years ago