Commit Graph

485 Commits (62f2554a01ead4873dae4a9385693422c834f002)

Author SHA1 Message Date
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen f5efdb21fd refactoring
13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen a5d7da68a0 refactoring: removed dependency from switchboard in Balancer/CrawlQueues
13 years ago
Michael Peter Christen 33d1062c79 refactoring: the cache belongs to the crawler
13 years ago
Michael Christen 22f05c83ff fixed default must-match filter for full domain crawls - the old filter
13 years ago
Michael Peter Christen 0cc0290978 bugfix for a must-not-match pattern check. This bug did not make the
13 years ago
Michael Peter Christen 2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method.
13 years ago
Michael Peter Christen c6c61be3f0 fix for http://bugs.yacy.net/view.php?id=148
13 years ago
Michael Peter Christen 0d148c3353 more logging in resource observer
13 years ago
Michael Peter Christen 2fa037ae1d enhanced crawler
13 years ago
Lotus ee89cf5ae5 fix must match filter for full domain crawl
13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
13 years ago
Michael Peter Christen 1f4f60654a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 2ee8cbeb2c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 992dbdf4bb added noload statistic to servlets
13 years ago
Michael Christen c21966bb43 fix
13 years ago
Michael Christen 13b05f9c08 fix
13 years ago
Michael Christen e5d878c59e Merge branch 'master' of ssh://gitorious.org/yacy/rc1
13 years ago
Michael Christen ec26b2bea4 Merge commit 'fa08ed5ae5d72bddc3cc6a662b23103579e86109' into quix0r
13 years ago
Michael Christen 216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
13 years ago
stbrumm d18095dc48 Patch fuer Issue 0000102
13 years ago
Roland 'Quix0r' Haeder 901f37d608 Also this ... :( #2
13 years ago
Roland 'Quix0r' Haeder a985717ed2 Also this ... :(
13 years ago
Roland 'Quix0r' Haeder 5f490de554 Fix for ported fix from my old days ...
13 years ago
Roland 'Quix0r' Haeder fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
Michael Christen c04bfaa51b refactoring
13 years ago
Michael Christen 6e66c9d7f1 fix for http://bugs.yacy.net/view.php?id=87
13 years ago
Michael Christen e7e429705a - less automatic indexing after a search (needs to reset the default
13 years ago
orbiter 11729061f2 added an option in the bookmark import process to put everything into the crawler
13 years ago
orbiter 8895d8c1cd removed unnecessary log entries
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
orbiter e4a82ddd8b produce a bookmark entry from every crawl start. these bookmarks are always private.
13 years ago
orbiter aa322bc6d0 fix
13 years ago
orbiter 97d1347adb added also a default accept field to robots.txt downloads
13 years ago
orbiter f183d3822c added a default accept header in http requests since some http fraud detection functions check that this header field exist
13 years ago
orbiter 06352b8d6b more logging
13 years ago
orbiter a99934226e more logging for debugging of robots.txt
13 years ago
orbiter 7a5841e061 fix for robot parser
13 years ago
orbiter 458c20ff72 fix for robot parser
13 years ago
orbiter 017a01714d - enhanced logging in robots.txt parser for remote debugging
13 years ago
orbiter eb1c7c041d write info about robots.txt evaluation into getpageinfo_p.xml
13 years ago
orbiter 775b44017e refactoring
13 years ago
orbiter 78ce3b13be typo
13 years ago
orbiter 85d6bf4ac4 fixed urls to media content during indexing
13 years ago
orbiter 3a807e10cf - added a cache for active crawl profiles to the crawl switchboard
13 years ago
orbiter 37e35f2741 normalization of url using urlencoding/decoding
13 years ago
orbiter 1b86d06d1e fix for http://bugs.yacy.net/view.php?id=62
13 years ago