Commit Graph

23 Commits (60c9986a0e4a4f4f75e1ba93e6cf1ac0f9f1fed6)

Author SHA1 Message Date
Michael Peter Christen e6a87e0426 enhanced crawler
3 years ago
Lina Ceballos a96752f5ab adding SPDX license and copyright headers
4 years ago
reger 379e9b330d use supplied url port to get robots.txt in crawlers hostqueue
9 years ago
reger 90686a75a2 fix flux factor (additional crawl delay by access count) calculation
9 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
Michael Peter Christen 6ada0daae9 making latency_factor and maximum number of same hosts in loader queue
11 years ago
Michael Peter Christen 0168f80c28 new crawling factors can now be changed during runtime
11 years ago
Michael Peter Christen 77531850b5 reverted crawling strategy from latest commit.
11 years ago
Michael Peter Christen c0da966dfa enhanced crawler speed
11 years ago
orbiter 0e8d752462 refactoring
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
11 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
Michael Peter Christen 77faeada4d small memory leak patch
12 years ago
Michael Peter Christen a3cd3852ab introduced a better place to update the lastacc time value in latency
12 years ago
Michael Peter Christen 864abcd33d removed Latency update after URL selection because that causes
12 years ago
Michael Peter Christen 756772fbd3 fix for waitingtime computation for intranet configuration
12 years ago
Michael Peter Christen 0fe8be7981 enhaced data structures for balancer and latency computation which
12 years ago
Michael Peter Christen b2ffd49817 less latency
12 years ago
Michael Peter Christen 0833937c1c better balancing and duetime-cumputation also for no-delay intranet
12 years ago
Michael Peter Christen 2d9e577ad0 replaced the custom robots.txt loader by the standard http loader
12 years ago
orbiter 8952153ecf update to Balancer algorithm:
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago