Commit Graph

34 Commits (fc594e8eda269753cddc8c69e6ebab61c4625200)

Author SHA1 Message Date
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
orbiter cfbacbbf08 reverted change in robotsParser
19 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
allo 99110e6fd2 Fixed some of the copyright headers.
19 years ago
orbiter 015d044c25 tried to fix some problems with latest changes to httpc
19 years ago
theli 34c075c1c7 testcommit with subversive
19 years ago
theli d3da7c9a08 *) Adding support for robots Allow directive
19 years ago
theli 734d18f283 *) more correct robots.txt validation
19 years ago
theli f0ad0d2b2b *) better robots.txt support
19 years ago
theli 915812f597 *) Undoing robots parser policy changes from svn rev. 1421
19 years ago
theli eeba8b055e *) guessing, testing and suggesting alternative hostnames on "unknown host" error
19 years ago
theli 5c56b9ed59 *) catch exceptions that could occur during url decoding
19 years ago
theli 754a35877f *) Changing robots parser cxclusion policy
19 years ago
orbiter 7920e1547d code cleanup
19 years ago
theli 9649d08171 *) More tolerant robots parser
19 years ago
theli 93cadb47b9 *) More tolerant robots parser for robots-files which missing empty lines between rule blocks
19 years ago
theli f9fb284fb7 *) Better handling of robots.txt files with incorrect keywords
19 years ago
theli b8ceb1ffde *) Adding better https support for crawler
19 years ago
theli 3b5d0eb053 *) Synchronizing robots.txt downloads to avoid parallel downloads of the same file by separate threads
19 years ago
theli 6c48c3ce39 *) Bugfix for ArithmeticException during IndexTransfer
19 years ago
theli 02d9af1a70 *) Restructuring and extending of Remote Proxy Support
19 years ago
theli 40777556c5 *) Connection Tracking
19 years ago
theli 959eefbc4f *) Robots.txt parser/ppt
19 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
19 years ago
theli 023be89586 *) Bugfix for "Robots.txt wird immer wieder geladen"
19 years ago
orbiter dc474aa22f various bug-fixes
19 years ago
rramthun 9dfbd93c7b Updated german language file
19 years ago
theli 2cd695f376 *) Bugfix path-entries of robots.txt were not decoded correctly
19 years ago
theli f8ad65eae1 *) First trial implementation of robots.txt support
19 years ago
allo 9300689dde bugfix *gr*
19 years ago
allo ebc39a7b9a minor fixes
19 years ago
allo f90f699ab1 missing package line.
19 years ago
allo 06a451768f a simple robotsParser.
19 years ago