Commit Graph

29 Commits (28a7b42e6bacff5072b2e1f5df8a13dc5e7bc184)

Author SHA1 Message Date
Michael Peter Christen 82c0525e71 wrong logger fix
11 years ago
Michael Peter Christen e3c2f09de9 - reduce computation in case that specific postprocessing fields are not
11 years ago
Michael Peter Christen a125904a1c fixed a NPE in surrogat processing
11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing
11 years ago
orbiter da33ee0d77 extended also timeout fr webgraph postprocessing
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen c833d02cf5 fixed webgraph postprocessing (did nothing and repeated to do this...)
11 years ago
Michael Peter Christen 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also
11 years ago
orbiter 5f5a97bafc added the anchor text within web pages to the searcheable entities of a
11 years ago
Michael Peter Christen 101a6e6e14 Patch the citation index for links with canonical tags.
11 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the
12 years ago
Michael Peter Christen 31920385f7 set anchor rel attribute of all links to "nofollow" if the html meta
12 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
12 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
12 years ago
Michael Peter Christen a88a62f7aa added a feature to set a collection for a crawl result based on a
12 years ago
orbiter 6fb2811e68 fixes for problems with remote solr and non-activated webgraph index
12 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
orbiter a9c8046c87 do a light optimization at the end of a crawl postprocessing
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
Michael Peter Christen 3502b4c697 refactoring (renaming) of yacy-solr api
12 years ago
Michael Peter Christen b8ed66a55d added all clickdepth computations for source and target paths in
12 years ago
Michael Peter Christen 2080fc7406 removed unused tag fields
12 years ago
orbiter 6b13dd0d3d added clickdepth field writing for webgraph core (unfinished)
12 years ago
Michael Peter Christen 4490133909 removed target_tag_s (superfluous)
12 years ago
Michael Peter Christen 089dee1770 - generalized SchemaConfiguration into super-class Configuration and
12 years ago
Michael Peter Christen 14cceb6b17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
12 years ago