Commit Graph

21 Commits (6db7f5525b153b0ceb9d5c39a38a16772bc60e5b)

Author SHA1 Message Date
reger 22db449f2a to prevent crawler to concurrently access and alter same crawl queue
8 years ago
reger 7789c32c82 delete crawl queue on init exception
9 years ago
reger 379e9b330d use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger 297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
9 years ago
reger 43c27aa550 upd to solr/lucene 5.3.1
9 years ago
reger 3e742d1e34 Init remote crawler on demand
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
Michael Peter Christen a39419f2ef more stacks shall be considered for on-demand loading, not only
10 years ago
Michael Peter Christen 10b1db430a if we have many hosts, use on-demand earlier
10 years ago
Michael Peter Christen 025516f682 fix for crawl limit for number of pages fail
10 years ago
orbiter e9163e7e10 fix for malformed hostpath names in crawl balancer
10 years ago
reger 92d1604a31 Crawler hostbalancer does not delete finished queue files,
11 years ago
reger ca5437dd50 fix crawl of file:// , also http://mantis.tokeek.de/view.php?id=149
11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared
11 years ago
Michael Peter Christen c1c1be8f02 fix for slow crawling and better logging in balancer
11 years ago
Michael Peter Christen 8b32dd5f9e special strategy for balancer: do not remove targets with zero wait time
11 years ago
Michael Peter Christen 9c6228d948 fix for deadlocks in crawler
11 years ago
Michael Peter Christen 06afb568e2 new Strategies in Balancer:
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago
orbiter f90d5296cb Added new data structure to be used by the balancer (not used yet).
11 years ago