Commit Graph

47 Commits (331e0a24fc5972ce4562649a66de9431c4e69d21)

Author SHA1 Message Date
Michael Peter Christen 1c0f50985c fixed documentation and some details of handling of keywords
2 years ago
Michael Peter Christen c0d9a3e9a7 turned HostBrowser into a admin-only page, now called IndexBrowser
4 years ago
luccioman ac766327d3 Switched a few more Solr fields from strictly mandatory to optional
8 years ago
luccioman cdc7f3e431 Switched some Solr fields from mandatory to optional
8 years ago
luccioman c68a8be2d9 Refactored and enforced Solr mandatory fields for proper operation
8 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
reger 15e46b2bad exclude in/outboundlinksnofollowcount_i from default schema fields
9 years ago
reger b2c8bc0ae6 remove md5_s from default index fields
9 years ago
Michael Peter Christen b060ba900d added parsing of contentprop attribute in html tags for
10 years ago
Michael Peter Christen 4cb4f67f38 added parsing of dd, dt and article html fields. The parsed result is
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 114f0afc1e enable sku as anchor in html response writer
10 years ago
Michael Peter Christen c94c24638f disabled postprocessing by default. If you read this: please disable
10 years ago
Michael Peter Christen c67c5c0709 added new solr schema fields which record the occurences of vocabulary
10 years ago
Michael Peter Christen b1cfbc4a04 added new solr field url_paths_count_i which can be used to enhance the
10 years ago
Michael Peter Christen 1092e798a5 fixed double content postprocessing
11 years ago
Michael Peter Christen ff5b3ac84d added new fields http_unique_b and www_unique_b which can be used for
11 years ago
Michael Peter Christen 9a5ab4e2c1 removed clickdepth_i field and related postprocessing. This information
11 years ago
Michael Peter Christen cca851a417 introduced new solr field crawldepth_i which records the crawl depth of
11 years ago
Michael Peter Christen e515dd460d added linkscount_i and linksnofollowcount_i to the default solr schema
11 years ago
Michael Peter Christen a7bc130e27 removed performance settings
11 years ago
reger f6099b730d disabled unused fields in default Solr collection schema
11 years ago
Michael Peter Christen e3c2f09de9 - reduce computation in case that specific postprocessing fields are not
11 years ago
Michael Peter Christen 1b61bd40ed - Added new solr field url_file_name_tokens_t which stores the file name
11 years ago
orbiter 5f5a97bafc added the anchor text within web pages to the searcheable entities of a
11 years ago
Michael Peter Christen 4f83d5f18c added the new field harvestkey_s to the collection index and the
11 years ago
Michael Peter Christen 85456f46b2 added two new fields, exact_signature_copycount_i and
11 years ago
Michael Peter Christen a2511b5600 turned images_alt_txt back to images_alt_sxt because it is not necessary
11 years ago
orbiter f106345eef link strings should not be tokenized
11 years ago
orbiter deadeb406e image alt tag strings should be tokenized
11 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
11 years ago
Michael Peter Christen 5a5d411ec0 new robots_i attribute fields
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
orbiter 8792e6c6e9 stub for better image indexing
12 years ago
Michael Peter Christen 570511f3c8 removed fields references_internal_id_sxt and
12 years ago
Michael Peter Christen 713a6199ef activated citation ranking by default
12 years ago
Michael Peter Christen f7e77a21bf Added a citation reference computation for intra-domain link structures.
12 years ago
Michael Peter Christen cca19d94d4 re-declared some fields to be of type string rather than text which
12 years ago
Michael Peter Christen 50421171c3 added new schema fields:
12 years ago
Michael Peter Christen 7ab5093321 added new solr title_exact_signature_l and
12 years ago
Michael Peter Christen 27d6222880 added new field host_extent_i which, after a crawl and postprocessing,
12 years ago
Michael Peter Christen ada3f27de7 added three new field for a better ranking: references_internal_i,
12 years ago
Michael Peter Christen 2080fc7406 removed unused tag fields
12 years ago
Michael Peter Christen addba047e2 changes in ranking computation
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
Michael Peter Christen 91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
12 years ago