Michael Peter Christen
8f2d3ce2f9
reduced locking situation in crawler: shifted synchronized location and
...
reduced time-out of robots.txt load limit
12 years ago
Michael Peter Christen
038f956821
fix for sitemap detection: the sitemap url was not visible if it
...
appeared after the declaration of robots allow/deny for the crawler
because the sitemap parser terminated after the allow/deny rules had
been found. Now the parser reads the robots.txt until the end to
discover also sitemap rules at the end of the file.
12 years ago
Michael Peter Christen
af465cdca5
fix for wrong robots.txt loading for https protocol
...
see also: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4579
12 years ago
Michael Peter Christen
8f3bd0c387
fix for smb crawl situation (lost too many urls)
12 years ago
orbiter
5aa5202adf
fixes for filesystem indexing
12 years ago
Michael Peter Christen
71ed8e5e07
bugfixes for crawler
12 years ago
Michael Peter Christen
0fe8be7981
enhaced data structures for balancer and latency computation which
...
should produce a bit better prognosis about forced waiting times.
12 years ago
Michael Peter Christen
0833937c1c
better balancing and duetime-cumputation also for no-delay intranet
...
hosts
12 years ago
Michael Peter Christen
c25d7bcb80
- added concurrency for robots.txt loading
...
- changed data model for domain counter
12 years ago
Michael Peter Christen
2d9e577ad0
replaced the custom robots.txt loader by the standard http loader
12 years ago
Michael Peter Christen
a33e2742cb
- removed unnecessary synchronized and deadlock in crawler
...
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
12 years ago
Michael Peter Christen
5f0ab25382
removed the option to prevent removal of & parts inside of the
...
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
12 years ago
Michael Peter Christen
00c1c777fa
refactoring
12 years ago