Commit Graph

344 Commits (0d29b972ccee9f5bffbb72dd3b0954ac57958443)

Author SHA1 Message Date
orbiter 9d5d86cd03 Added filter query options to the ranking servlet /RankingSolr_p.html.
11 years ago
Michael Peter Christen 74c249288a added a push api to make it possible to upload files directly without
11 years ago
Michael Peter Christen ba6ffddefc refactoring
11 years ago
Michael Peter Christen 922979aae1 added option to prefer http over https in unique-protocol ranking
11 years ago
Michael Peter Christen b3b174e2b8 fixed webgraph postprocessing and status display in Crawler_p servlet
11 years ago
Michael Peter Christen f23c4142e0 added option to configure a custom user agent within allip networks
11 years ago
sixcooler 830057d788 lower Segment-size (hope to get Segments of 10GB)
11 years ago
reger e31493e139 "Use remote proxy for yacy" has no function, remove option and related config item
11 years ago
Michael Peter Christen a1ac4c3b76 automatically clear graphics cache
11 years ago
reger 1432a817dd respect "index media" switched off in CrawlStartExpert.html
11 years ago
Michael Peter Christen e84e07399a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger 8a7c68e4c7 content of surrogates/out never accessed (remove)
11 years ago
Michael Peter Christen 229f2248b8 added configuration option for maxmimum load and minimum ram for
11 years ago
orbiter 8e5ce7cd51 fixed a situation where finished crawls had not been detected.
11 years ago
Michael Peter Christen 5746aae3db add canonical links to the same crawldepth, not the next crawldepth
11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
Michael Peter Christen 075b6f9278 refactoring of the crawl balancer: the balancer is turned into an
11 years ago
Michael Peter Christen df138084c0 do solr optimization independently from memory and load constraints:
11 years ago
orbiter 3c1274057d fixed thread dump in case of wrong seeds
11 years ago
Michael Peter Christen cca851a417 introduced new solr field crawldepth_i which records the crawl depth of
11 years ago
Michael Peter Christen 63c9fcf3e0 free configuration of postprocessing clickdepth maximum depth and time
11 years ago
Michael Peter Christen 8b44fcf0f4 added missing @Override annotation
11 years ago
Michael Peter Christen b08375da33 fix for bad/missing values of size_i
11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document
11 years ago
Michael Peter Christen e485fbd0ce - let crawl loader jobs die after 10 seconds without new jobs
11 years ago
Michael Peter Christen bcd9dd9e1d enhanced concurrent loading by using a fixed set of concurrent loader
11 years ago
Michael Peter Christen 6ed9c0164e attaching names to all Threads to get a better view in profiling tools
11 years ago
Michael Peter Christen d325cb8912 fixes and enhancements for postprocessing
11 years ago
Michael Peter Christen 0f6b72f24b do not use luke requests for remote solr servers if the result is
11 years ago
Michael Peter Christen a2b66fe2eb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 9f6be762a6 - better logging for postprocessing
11 years ago
orbiter f6e441dd77 refactoring
11 years ago
reger 0923b09216 fix: allow 4 character admin user name
11 years ago
Michael Peter Christen 254a7ac66c fixed cleaning of index
11 years ago
Michael Peter Christen 69391e5d9e changed strategy to test existence of documents in Solr: using the
11 years ago
Michael Peter Christen ca8b100f96 run the cleanup process even when load is high, do postprocessing even
11 years ago
Michael Peter Christen 3d474a843e added memory protection for postprocessing
11 years ago
Michael Peter Christen 6e59ca4ebf removed jena library and all code that depended on jena. When jena was
11 years ago
Michael Peter Christen 931541d198 re-inserted default value re-set button to performance queues and
11 years ago
Michael Peter Christen 8b14e92ba4 added button in host browser to re-load 404/failed documents
11 years ago
Michael Peter Christen 6ada0daae9 making latency_factor and maximum number of same hosts in loader queue
11 years ago
Michael Peter Christen 489c3fbc90 code simplifications / removed warnings
11 years ago
Michael Peter Christen 0168f80c28 new crawling factors can now be changed during runtime
11 years ago
Michael Peter Christen be5e808236 - removed hardcoded load-test which is now handled in BusyQueues
11 years ago
sixcooler 40a4030b55 configurable max-load values for YaCy-Threads:
11 years ago
Michael Peter Christen 77531850b5 reverted crawling strategy from latest commit.
11 years ago
Michael Peter Christen 0d235a565b cleanup crawl loader jobs
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago