Commit Graph

19 Commits (4eddabee4221f0d8deec55a157b520e1e1c140ce)

Author SHA1 Message Date
reger 379e9b330d use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger b5371ea8c1 read/init crawl queue in a thread
9 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
Michael Peter Christen 5bb52f79be reduce number of calls to queue.size() because that may be a bottleneck
10 years ago
Michael Peter Christen a34f837592 better delete all files in path when removing host crawl stack
10 years ago
orbiter 4ae7aead28 addon to latest fix
10 years ago
Michael Peter Christen 49d91b94c3 npe fix in crawler
10 years ago
orbiter e9163e7e10 fix for malformed hostpath names in crawl balancer
10 years ago
Michael Peter Christen 06ab72d1af enhanced crawler host round-robin strategy
11 years ago
Michael Peter Christen 49886fab08 enhanced debugging
11 years ago
orbiter d7d38f9135 made number of open files in crawler configurable and increased default
11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared
11 years ago
reger 1600414450 fix NPE on continuing crawls after YaCy restart
11 years ago
Michael Peter Christen c1c1be8f02 fix for slow crawling and better logging in balancer
11 years ago
orbiter 2f63bd0261 enhanced Host Balancer strategy: fair round robin
11 years ago
Michael Peter Christen 8b32dd5f9e special strategy for balancer: do not remove targets with zero wait time
11 years ago
Michael Peter Christen 9c6228d948 fix for deadlocks in crawler
11 years ago
Michael Peter Christen 06afb568e2 new Strategies in Balancer:
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago