Commit Graph

29 Commits (d524a9d77cf4613e5e3269ec722006880710a8de)

Author SHA1 Message Date
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen 022c6d3ce1 do YaCy p2p connections using a timeout-request which covers the http
11 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
12 years ago
Michael Peter Christen 35ab2cef7b added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in
12 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
12 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen a33e2742cb - removed unnecessary synchronized and deadlock in crawler
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
13 years ago
Michael Peter Christen fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within
13 years ago
Michael Peter Christen 77f795756c fixing redirects and status codes: storing of status code in
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
orbiter d8e934c085 better abstraction of http client identification
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
14 years ago
orbiter 8d14916c74 more patches for a better out-of-memory management
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
low012 2a6499364d *) minor changes
14 years ago
low012 c0274bd123 *) minor changes
14 years ago
orbiter 4c72885cba added a sitemap entry parser and loader for sitemaps
14 years ago
orbiter 114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
15 years ago