Commit Graph

120 Commits (5d71a4c8bc6321a8d2f5d45b00cf11ad1bcd111b)

Author SHA1 Message Date
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
11 years ago
sixcooler 7d53ac86a3 fix for Blacklist (-Administration)
11 years ago
Roland Haeder b58ca8622d Some cleanups:
11 years ago
Roland Haeder aaedc0405d Fixes and avoid of catching bad exceptions (some):
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
Felix Ableitner 03044589dd Fixed (?i) appearing in entries, fixed multiple equal lines in file.
11 years ago
Michael Peter Christen 89c0aa0e74 added collection_sxt to error documents
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Felix Ableitner 44f8fcf62e Changed class structure of Blacklist.
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen 77faeada4d small memory leak patch
12 years ago
orbiter 5d442dad82 avoid NPE in regex checker
12 years ago
Marc Nause ac478384d3 *) did some long overdue refactoring
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
sixcooler 3a13906121 clear some more caches if running out of memory
12 years ago
Michael Peter Christen 84f82541e8 search process enhancements
12 years ago
reger e80dfeca23 - making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171)
12 years ago
reger 1faa045dc1 fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*')
12 years ago
Michael Peter Christen 2d9e577ad0 replaced the custom robots.txt loader by the standard http loader
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen bbd242afb4 fix for a NPE
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
orbiter 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
12 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Roland 'Quix0r' Haeder aef9dd0350 - removed cleaning of blacklist cache on startup
13 years ago
Michael Peter Christen c3db015410 prevent loading of content from the cache when retrieval with IFFRESH is
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
Michael Peter Christen 5bd3c90907 - removed unnecessary semicolons
13 years ago
Michael Peter Christen 7c1ba99755 removed more unused method parameters
13 years ago
Michael Peter Christen 0301aba1e9 removed unused method parameters
13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code
13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 77f795756c fixing redirects and status codes: storing of status code in
13 years ago
Michael Peter Christen 7dc59979bc fix for npe, possibly for http://bugs.yacy.net/view.php?id=195
13 years ago
Michael Peter Christen 4ee6fb1de9 added missing blacklist dht cache storage (maybe due to mistakes in
13 years ago
Roland 'Quix0r' Haeder e4d36fa5eb Fix to make all values lower-case (this should make all existing blacklists compatible with the new enum)
13 years ago
Roland 'Quix0r' Haeder edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
13 years ago
Michael Peter Christen 7a329465b3 using pre-compile pattern in blacklist; should enhance search speed
13 years ago
Michael Peter Christen 7e0ddbd275 added a "fromCache" flag in Response object to omit one cache.has()
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen 33d1062c79 refactoring: the cache belongs to the crawler
13 years ago
reger a95f645a61 Bugfix class repository.Loaddispatcher fixed download file limit of 10000
13 years ago
Michael Peter Christen ef5192f8c9 using the generic document parser for crawl starts instead of the html
13 years ago
Marek Otahal f40efb39af Blacklist loadList() remove duplicates by using Set
13 years ago
Michael Christen eebc02f5c1 fix
13 years ago