Commit Graph

960 Commits (8d1b66accca613e0b79f075cda336c95be6bcf57)

Author SHA1 Message Date
orbiter d68438c3d9 make sure that the postprocessing background thread never dies by any
11 years ago
reger e88537522d allow single quote " ' " in query
11 years ago
orbiter 487021fb0a snippet computation update
11 years ago
orbiter 927aaa95a6 concurrency bugfix
11 years ago
reger 7584352e7b use more predefined Solr query parameter constants
11 years ago
reger f9db5dd6c5 reduce doublecontent check document (prevent out of memory)
11 years ago
reger a8508417d1 catch NPE during crawl (OAI import)
11 years ago
Michael Peter Christen 6344718f8b reducing the concurrent query stack size and reduced concurrency of
11 years ago
Michael Peter Christen c465b791af typo
11 years ago
Michael Peter Christen 191ec8c82a added concurrency to postprocess rewrite process
11 years ago
Michael Peter Christen a1e8bdd5e9 log ppm instead of docs/second
11 years ago
Michael Peter Christen cc0ded7abd set process type of web graph according to fields as defined in the
11 years ago
Michael Peter Christen 12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
11 years ago
Michael Peter Christen 338f574bdc no sorting if http/www unique fields are not demanded (makes query
11 years ago
Michael Peter Christen 0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
11 years ago
orbiter 4099296b45 added new classes which shall reduce call overhead to Solr (stub)
11 years ago
orbiter 3491ab4c38 removed unused images from webgraph edge computation
11 years ago
orbiter 2371d6b8db target linktexts must be string to enable search facets on these fields
11 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
11 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
11 years ago
Michael Peter Christen c115f3869c enhanced snippet computation and test method in ViewFile
11 years ago
orbiter 1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
11 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
11 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
11 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
11 years ago
Michael Peter Christen 504327b15c fix for condition for writing the webgraph
11 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
11 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
11 years ago
reger f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
11 years ago
reger a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen bf1b6b93e7 do not write CR values to webgraph if no CR values are computed
11 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
11 years ago
Michael Peter Christen 8514bffc22 enhanced postprocessing status report
11 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
orbiter dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
11 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen b0d941626f fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger d9472d043a cleanup older unused classes
11 years ago
reger 665e12f88e move startup time from old serverCore to switchboard (most used here)
11 years ago
reger 336425912a remove unused localSearchThread from SearchEvent
11 years ago
Michael Peter Christen 1092e798a5 fixed double content postprocessing
11 years ago