Commit Graph

47 Commits (6b45cd579922574059e5385153b84be3ca07533b)

Author SHA1 Message Date
reger c31d94664a Update deprecated SolrInputDocument.addField() with boost value
7 years ago
luccioman 654801523e Fixed StringIndexOutOfBoundsException case.
8 years ago
reger 581b00cc20 remove obsolete lastmodified calculation in WebgraphConfig
8 years ago
Michael Peter Christen cc0ded7abd set process type of web graph according to fields as defined in the
10 years ago
orbiter 3491ab4c38 removed unused images from webgraph edge computation
10 years ago
orbiter 2371d6b8db target linktexts must be string to enable search facets on these fields
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
10 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
10 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
10 years ago
Michael Peter Christen b3b174e2b8 fixed webgraph postprocessing and status display in Crawler_p servlet
11 years ago
Michael Peter Christen c2f62e783f - better subgraph handling, less overhead for crawls without the
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen 67beef657f strong redesign of html parser: object recursion is now made using a
11 years ago
Michael Peter Christen 3ce8eff21b another fix for inbound/outbound detection
11 years ago
Michael Peter Christen 48fbfa60c1 bugfix to inbound/outbound identification
11 years ago
Michael Peter Christen 61ad194065 fix for source and target clickdepth in webgraph index
11 years ago
Michael Peter Christen 51800007c4 - added concurrency to postprocessing of webgraph document
11 years ago
Michael Peter Christen 0f6b72f24b do not use luke requests for remote solr servers if the result is
11 years ago
Michael Peter Christen 82c0525e71 wrong logger fix
11 years ago
Michael Peter Christen e3c2f09de9 - reduce computation in case that specific postprocessing fields are not
11 years ago
Michael Peter Christen a125904a1c fixed a NPE in surrogat processing
11 years ago
Michael Peter Christen 0db8e34625 enhanced webgraph processing
11 years ago
orbiter da33ee0d77 extended also timeout fr webgraph postprocessing
11 years ago
Michael Peter Christen 9d5895f643 enhanced and fixed postprocessing
11 years ago
Michael Peter Christen c833d02cf5 fixed webgraph postprocessing (did nothing and repeated to do this...)
11 years ago
Michael Peter Christen 74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also
11 years ago
orbiter 5f5a97bafc added the anchor text within web pages to the searcheable entities of a
11 years ago
Michael Peter Christen 101a6e6e14 Patch the citation index for links with canonical tags.
11 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the
11 years ago
Michael Peter Christen 31920385f7 set anchor rel attribute of all links to "nofollow" if the html meta
11 years ago
Michael Peter Christen 61c5e40687 - replaced the properties object in AnchorURL with distinct variables
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen a88a62f7aa added a feature to set a collection for a crawl result based on a
11 years ago
orbiter 6fb2811e68 fixes for problems with remote solr and non-activated webgraph index
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
orbiter a9c8046c87 do a light optimization at the end of a crawl postprocessing
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
Michael Peter Christen 3502b4c697 refactoring (renaming) of yacy-solr api
12 years ago
Michael Peter Christen b8ed66a55d added all clickdepth computations for source and target paths in
12 years ago
Michael Peter Christen 2080fc7406 removed unused tag fields
12 years ago
orbiter 6b13dd0d3d added clickdepth field writing for webgraph core (unfinished)
12 years ago
Michael Peter Christen 4490133909 removed target_tag_s (superfluous)
12 years ago
Michael Peter Christen 089dee1770 - generalized SchemaConfiguration into super-class Configuration and
12 years ago
Michael Peter Christen 14cceb6b17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
12 years ago