Commit Graph

431 Commits (27ab733685930a8c967c3f28197fc34f98123010)

Author SHA1 Message Date
Michael Peter Christen 7db2888336 fixed font size and print page generation in pdf snapshots
10 years ago
Michael Peter Christen 3b51636ecb fix for mediawiki import
10 years ago
Michael Peter Christen 3e6c3e2237 documents pushed over the api/push_p.html interface will have their
10 years ago
Michael Peter Christen 932faafffe reactivated on-demand snapshot loading
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 6a1865f507 refactoring date -> lastModified
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
reger 5f0bb1214f modified FieldReIndex to reindex queries with low number of documents first
10 years ago
Michael Peter Christen 70f03f7c8e do not cache search requests to Solr if the result is used for
10 years ago
Michael Peter Christen 6a2a669db4 added loading of the synonyms file from addon/synonyms into the
10 years ago
Michael Peter Christen 0a879c98e7 added new 'firstSeen' database table and necessary data structures which
10 years ago
sixcooler 9c6e3a6b1c fix assertation-failure in version-string for Solr-4.10.2 by changing
10 years ago
sixcooler 725b206fb4 update to solr-/lucene-4.10.2
10 years ago
orbiter 0fcd8097a3 removed unused options from BusyThreads
10 years ago
Michael Peter Christen 77662e08e1 concurrently initialize the error cache; extended also the cache by
10 years ago
orbiter a922b122a3 added a hack to forward solr search results from an external attached
10 years ago
Michael Peter Christen 6d3d4c4ea6 changed the concurrent enumeration of query results in such a way that
10 years ago
Michael Peter Christen 81f9b34da7 increaesed ability ot search for all images on a single server within
10 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
10 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
10 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
10 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
10 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
10 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
10 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
10 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
Michael Peter Christen 09dcdb9b19 update to solr 4.9.0
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing:
11 years ago
Michael Peter Christen f0db501630 better handling of ranking parameters and new default values for date
11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter
11 years ago
Michael Peter Christen 6634b5b737 debug code for index distribution testing
11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared
11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
orbiter 95780eed32 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 6bd8c6f195 fix for wrong status codes of error pages
11 years ago
orbiter c250fac9f4 linkstructure refactoring to get more options for clickdepth analysis
11 years ago
Michael Peter Christen bd886054cb new structure and enhancements for link graph computation:
11 years ago
Michael Peter Christen ebd44a7080 replaced solr 4.6.1 with solr 4.7.1 and added index migration to
11 years ago
Michael Peter Christen 926d28dd3f fixed a bug which prevented crawl starts after a network switch
11 years ago
reger 227c42bc96 eleminate obsolete URIMetaDataRow class
11 years ago
Michael Peter Christen 63c9fcf3e0 free configuration of postprocessing clickdepth maximum depth and time
11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document
11 years ago
Michael Peter Christen fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
11 years ago
Michael Peter Christen 7c1b968378 another fix for the shutdown exceptions
11 years ago
Michael Peter Christen 7640834b37 removed double concurrency to put Solr documents into the index. The
11 years ago
Michael Peter Christen 0f6b72f24b do not use luke requests for remote solr servers if the result is
11 years ago
orbiter ced1a96f9c fixed error cache
11 years ago
orbiter cfb647db6e - introduced a miss cache in ConcurrentUpdateSolrConnector
11 years ago
orbiter a87d8e4a8e changed caching of ConcurrentUpdateSolrConnector: it caches now also the
11 years ago
orbiter f6e441dd77 refactoring
11 years ago
orbiter 76c53faeb2 removed unused code (HostStat)
11 years ago
Michael Peter Christen 254a7ac66c fixed cleaning of index
11 years ago
Michael Peter Christen 69391e5d9e changed strategy to test existence of documents in Solr: using the
11 years ago
Michael Peter Christen 9eb668e951 enhanced the resource observer
11 years ago
Michael Peter Christen bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
11 years ago
Michael Peter Christen 195e5868d3 catch solr close exceptions
11 years ago
Michael Peter Christen 0cabcbbe83 more efficient wordcount
11 years ago
Michael Peter Christen 3d474a843e added memory protection for postprocessing
11 years ago
Michael Peter Christen 9228214f9b enrichment of PerformanceMemory display of SolrInfoMBean table
11 years ago
Michael Peter Christen e8bdf16ea7 added statistic information for solr resources in PerformanceMemory
11 years ago
Michael Peter Christen 456e52e0d5 enhanced strategy to clear solr caches
11 years ago
orbiter c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
11 years ago
reger 9b24dae2b7 add language navigation filter clause to rwi results
11 years ago
Michael Peter Christen c84bcc878a first try to add a generic solr servlet as luke request servlet
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago
Michael Peter Christen f8ce7040ab remote search peer selection schema change:
11 years ago
reger 280c4a3ac1 exclude terms with " for didYouMean suggestion
11 years ago
reger 6932aa4d7a use configured admin-username for api calls
11 years ago
orbiter 2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
11 years ago
orbiter 3cb6c7861f fixed shutdown authenticaton problem
11 years ago
Michael Peter Christen ee17bd0b69 added option to attach remote solr servers in read-only mode
11 years ago
Michael Peter Christen 2f16770681 migrated to solr 4.6.0
11 years ago
Michael Peter Christen 2702d9e56b - added a SolrQueryResponse2SolrDocumentList method which is able to
11 years ago
Michael Peter Christen 552ef9f18e fix for bad ErrorCache.exists test (bug from latest commit)
11 years ago
Michael Peter Christen 303f5694ba avoid usage of existsByQuery. If a document can be loaded by the ID
11 years ago
Michael Peter Christen 78eac85161 better calibration of caches and queue maximum sizes
11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing
11 years ago
Michael Peter Christen c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over
11 years ago
Michael Peter Christen 9932c441c8 fixed a problem with Date fields parsing Solr results if a remote Solr
11 years ago
orbiter 037cd0a57c using the BinaryResponseWriter which is supported within the YaCy solr
11 years ago
Michael Peter Christen 9cf9727685 fix for wrong counter
11 years ago
Michael Peter Christen fceac8cffd more monitoring for postprocessing
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen 1a4a69c226 set more logger to 'final static'
11 years ago
Michael Peter Christen acc1f8a749 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
11 years ago
sixcooler 987f410011 URL-export:add query and fix for cast-class-exception
11 years ago
Michael Peter Christen e1c1e57877 less overhead calling exist() with only one hash
11 years ago
Michael Peter Christen 5a02d650ee avoid cloning
11 years ago
Michael Peter Christen cc39667399 Speed enhancements and less CPU usage during Solr searches when using
11 years ago
Michael Peter Christen 434e13b46d in host browser also show the properties of failed documents including
11 years ago