Commit Graph

145 Commits (cc090bcb01bd240be911426f85e811f4e323c348)

Author SHA1 Message Date
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
reger ff18129def ViewFile servlet: update index if newer,
10 years ago
Michael Peter Christen e586e423aa in case that loading from the cache fails, load from wkhtmltopdf without
10 years ago
Michael Peter Christen d5bac64421 recognize more html file types for snapshots
10 years ago
Michael Peter Christen 25a64c51b3 moved snapshot generation out of the html handler to prevent that
10 years ago
reger 48aed15c48 skip loader wait cycle on concurrent access in nocache configuration.
10 years ago
Michael Peter Christen 1735dbc9d9 enhanced image search: bugfixes and performance enhancements
10 years ago
Michael Peter Christen ebd0be2cea fixes and speed updates for search process
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Marc Nause c97da1a0d8 First draft of a blacklist API.
11 years ago
reger 727dfb5875 refactore URIMetadataNode to further unify interaction with index
11 years ago
Michael Peter Christen 10cf8215bd added crawl depth for failed documents
11 years ago
Michael Peter Christen da86f150ab - added a new Crawler Balancer: HostBalancer and HostQueues:
11 years ago
Michael Peter Christen 6bd8c6f195 fix for wrong status codes of error pages
11 years ago
reger 227c42bc96 eleminate obsolete URIMetaDataRow class
11 years ago
Marc Nause 809b4e1fd9 Team added support for URLs with unicode characters in host part to
11 years ago
Michael Peter Christen b08375da33 fix for bad/missing values of size_i
11 years ago
Michael Peter Christen 1ea17bd9f3 - removed old metadata database and all migration code
11 years ago
reger 58ecf5e4dd add to blacklist button in CrawlResults
11 years ago
Michael Peter Christen 1a4a69c226 set more logger to 'final static'
11 years ago
Michael Peter Christen 91a875dff5 self-healing of mistakenly deactivated crawl profiles. This fixes a bug
11 years ago
Michael Peter Christen 2602be8d1e - removed ZURL data structure; removed also the ZURL data file
11 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
11 years ago
sixcooler 7d53ac86a3 fix for Blacklist (-Administration)
11 years ago
Roland Haeder b58ca8622d Some cleanups:
11 years ago
Roland Haeder aaedc0405d Fixes and avoid of catching bad exceptions (some):
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
Felix Ableitner 03044589dd Fixed (?i) appearing in entries, fixed multiple equal lines in file.
11 years ago
Michael Peter Christen 89c0aa0e74 added collection_sxt to error documents
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Felix Ableitner 44f8fcf62e Changed class structure of Blacklist.
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen 77faeada4d small memory leak patch
12 years ago
orbiter 5d442dad82 avoid NPE in regex checker
12 years ago
Marc Nause ac478384d3 *) did some long overdue refactoring
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
sixcooler 3a13906121 clear some more caches if running out of memory
12 years ago
Michael Peter Christen 84f82541e8 search process enhancements
12 years ago
reger e80dfeca23 - making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171)
12 years ago
reger 1faa045dc1 fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*')
12 years ago
Michael Peter Christen 2d9e577ad0 replaced the custom robots.txt loader by the standard http loader
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
Michael Peter Christen bbd242afb4 fix for a NPE
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago