Commit Graph

51 Commits (d9d1c8de705ad2e001a002ff72f2499796ea7b59)

Author SHA1 Message Date
orbiter 1689030ee8 refactoring: moved all crawler classes into their own package
17 years ago
orbiter 202a3adb3e refactoring of HttpClient Writer processes
17 years ago
orbiter e356625b22 - refacotring of stream copy handling to support time-consuming operations
17 years ago
orbiter c3342e1178 - removed class with only one static method
17 years ago
danielr 5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
17 years ago
orbiter 0f5c4abaca more generics
17 years ago
orbiter 11b4f80bde - fixed non-closing client connections
17 years ago
orbiter 1488769e1f cleanup of unmaintained and outdated performance methods:
17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago
orbiter 57a5b6fa71 some generalization of remote proxy configuration and setting handling in httpc
18 years ago
orbiter 40b0547611 - documentaton changes (removed old forum links)
18 years ago
orbiter 36a37f758b fix for oom exception during release download
18 years ago
theli b1680ab71f *) bugfix for ArrayIndexOutOfBoundsException in robots-parser (thanks to low012)
18 years ago
theli 9a4375b115 *) robots.txt: adding support for crawl-delay
18 years ago
theli 6f46245a51 *) Bookmarks: Ajax icon is displayed while loading title
18 years ago
theli 2399ed817c *) robots.txt parser now extracts the sitemap-URL (will be used later)
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
orbiter cfbacbbf08 reverted change in robotsParser
19 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
allo 99110e6fd2 Fixed some of the copyright headers.
19 years ago
orbiter 015d044c25 tried to fix some problems with latest changes to httpc
19 years ago
theli 34c075c1c7 testcommit with subversive
19 years ago
theli d3da7c9a08 *) Adding support for robots Allow directive
19 years ago
theli 734d18f283 *) more correct robots.txt validation
19 years ago
theli f0ad0d2b2b *) better robots.txt support
19 years ago
theli 915812f597 *) Undoing robots parser policy changes from svn rev. 1421
19 years ago
theli eeba8b055e *) guessing, testing and suggesting alternative hostnames on "unknown host" error
19 years ago
theli 5c56b9ed59 *) catch exceptions that could occur during url decoding
19 years ago
theli 754a35877f *) Changing robots parser cxclusion policy
19 years ago
orbiter 7920e1547d code cleanup
19 years ago
theli 9649d08171 *) More tolerant robots parser
19 years ago
theli 93cadb47b9 *) More tolerant robots parser for robots-files which missing empty lines between rule blocks
19 years ago
theli f9fb284fb7 *) Better handling of robots.txt files with incorrect keywords
19 years ago
theli b8ceb1ffde *) Adding better https support for crawler
19 years ago
theli 3b5d0eb053 *) Synchronizing robots.txt downloads to avoid parallel downloads of the same file by separate threads
19 years ago
theli 6c48c3ce39 *) Bugfix for ArithmeticException during IndexTransfer
19 years ago
theli 02d9af1a70 *) Restructuring and extending of Remote Proxy Support
19 years ago
theli 40777556c5 *) Connection Tracking
19 years ago
theli 959eefbc4f *) Robots.txt parser/ppt
19 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
19 years ago
theli 023be89586 *) Bugfix for "Robots.txt wird immer wieder geladen"
19 years ago
orbiter dc474aa22f various bug-fixes
19 years ago
rramthun 9dfbd93c7b Updated german language file
19 years ago
theli 2cd695f376 *) Bugfix path-entries of robots.txt were not decoded correctly
19 years ago
theli f8ad65eae1 *) First trial implementation of robots.txt support
19 years ago
allo 9300689dde bugfix *gr*
19 years ago
allo ebc39a7b9a minor fixes
19 years ago
allo f90f699ab1 missing package line.
19 years ago