Commit Graph

19 Commits (d79fa7fbebdf8a588adf8cbe999ace36ea002a02)

Author SHA1 Message Date
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues: 11 years ago
Michael Peter Christen 6ada0daae9 making latency_factor and maximum number of same hosts in loader queue 11 years ago
Michael Peter Christen 0168f80c28 new crawling factors can now be changed during runtime 11 years ago
Michael Peter Christen 77531850b5 reverted crawling strategy from latest commit. 11 years ago
Michael Peter Christen c0da966dfa enhanced crawler speed 11 years ago
orbiter 0e8d752462 refactoring 12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not 12 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user 12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name 12 years ago
Michael Peter Christen 77faeada4d small memory leak patch 12 years ago
Michael Peter Christen a3cd3852ab introduced a better place to update the lastacc time value in latency 12 years ago
Michael Peter Christen 864abcd33d removed Latency update after URL selection because that causes 12 years ago
Michael Peter Christen 756772fbd3 fix for waitingtime computation for intranet configuration 12 years ago
Michael Peter Christen 0fe8be7981 enhaced data structures for balancer and latency computation which 13 years ago
Michael Peter Christen b2ffd49817 less latency 13 years ago
Michael Peter Christen 0833937c1c better balancing and duetime-cumputation also for no-delay intranet 13 years ago
Michael Peter Christen 2d9e577ad0 replaced the custom robots.txt loader by the standard http loader 13 years ago
orbiter 8952153ecf update to Balancer algorithm: 13 years ago
Michael Peter Christen 00c1c777fa refactoring 13 years ago