Commit Graph

32 Commits (8d2e6355f81a9c79b8133c549bf30011d9c940bd)

Author SHA1 Message Date
Michael Peter Christen 70505107ca enhanced crawler/balancer: better remaining waiting-time guessing
13 years ago
Michael Peter Christen a5d7da68a0 refactoring: removed dependency from switchboard in Balancer/CrawlQueues
13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
13 years ago
orbiter 813f297a95 another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
orbiter 17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
15 years ago
orbiter 30b337fa9f fixes to balancer when crawling filesystem (problem was: host == null)
15 years ago
orbiter 844853243a fixed balancer time guessing
15 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
orbiter 90dd197ae7 - no latency for local crawls
15 years ago
orbiter bc96d74813 - clean-up of robots.txt parser
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter 2e6bdce086 - added more logging to balancer
15 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter 13c63f4082 a set of small fixes to crawling behaviour
16 years ago
orbiter 5fdba0fa51 - fixed a not working selection rule in balancer
16 years ago
orbiter 95e8cbd1c3 new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
16 years ago
orbiter 9bfb2641db - removed deprecated threads
16 years ago
orbiter b6c2167143 - patch for bad web structure dumps
16 years ago
orbiter 37f892b988 added new concurrent merger class for IndexCell RWI data
16 years ago
borg-0300 8c494afcfe svn attributes added
16 years ago
orbiter 67aaffc0a2 - added Latency control to the crawler:
16 years ago