You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
407 lines
10 KiB
407 lines
10 KiB
![]()
12 years ago
|
## this is a list of all solr keys for the default index 'collection1', the fulltext search index
|
||
|
## this complete list of keys can be changed; the actual schema is stored in:
|
||
|
## DATA/SETTINGS/solr.collection.schema
|
||
![]()
14 years ago
|
|
||
|
## the syntax of this file:
|
||
|
## - all lines beginning with '##' are comments
|
||
|
## - all non-empty lines not beginning with '#' are keyword lines
|
||
|
## - all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines
|
||
|
|
||
![]()
13 years ago
|
### mandatory values, do not disable them, YaCy won't work without them
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## primary key of document, the URL hash, string (mandatory field)
|
||
![]()
14 years ago
|
id
|
||
|
|
||
![]()
13 years ago
|
##url of document, string (mandatory field)
|
||
|
sku
|
||
|
|
||
|
## last-modified from http header, date (mandatory field)
|
||
|
last_modified
|
||
|
|
||
|
## mime-type of document, string (mandatory field)
|
||
|
content_type
|
||
|
|
||
|
## content of title tag, text (mandatory field)
|
||
|
title
|
||
|
|
||
![]()
12 years ago
|
## flag shows if title is unique in the whole index; if yes and another document appears with same title, the unique-flag is set to false, boolean
|
||
|
#title_unique_b
|
||
|
|
||
![]()
13 years ago
|
## id of the host, a 6-byte hash that is part of the document id (mandatory field)
|
||
|
host_id_s
|
||
|
|
||
|
## the md5 of the raw source (mandatory field)
|
||
|
md5_s
|
||
|
|
||
![]()
12 years ago
|
## the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of text_t
|
||
|
exact_signature_l
|
||
|
|
||
|
## flag shows if exact_signature_l is unique at the time of document creation, used for double-check during search
|
||
|
exact_signature_unique_b
|
||
|
|
||
|
## 64 bit of the Lookup3Signature from EnhancedTextProfileSignature of text_t
|
||
|
fuzzy_signature_l
|
||
|
|
||
|
## intermediate data produced in EnhancedTextProfileSignature: a list of word frequencies
|
||
|
#fuzzy_signature_text_t
|
||
|
|
||
|
## flag shows if fuzzy_signature_l is unique at the time of document creation, used for double-check during search
|
||
|
fuzzy_signature_unique_b
|
||
|
|
||
![]()
13 years ago
|
## the size of the raw source (mandatory field)
|
||
|
size_i
|
||
|
|
||
|
## fail reason if a page was not loaded. if the page was loaded then this field is empty, text (mandatory field)
|
||
|
failreason_t
|
||
|
|
||
![]()
12 years ago
|
## fail type if a page was not loaded. This field is either empty, 'excl' or 'fail'
|
||
|
failtype_s
|
||
|
|
||
![]()
13 years ago
|
## html status return code (i.e. "200" for ok), -1 if not loaded (see content of failreason_t for this case), int (mandatory field)
|
||
|
httpstatus_i
|
||
|
|
||
![]()
13 years ago
|
## redirect url if the error code is 299 < httpstatus_i < 310
|
||
![]()
13 years ago
|
#httpstatus_redirect_s
|
||
|
|
||
![]()
12 years ago
|
## number of unique http references; used for ranking
|
||
|
references_i
|
||
|
|
||
![]()
12 years ago
|
## depth of web page according to number of clicks from the 'main' page, which is the page that appears if only the host is entered as url
|
||
![]()
12 years ago
|
clickdepth_i
|
||
![]()
12 years ago
|
|
||
![]()
12 years ago
|
## needed (post-)processing steps on this metadata set
|
||
|
process_sxt
|
||
|
|
||
|
|
||
![]()
13 years ago
|
### optional but highly recommended values, part of the index distribution process
|
||
|
|
||
|
## time when resource was loaded
|
||
|
load_date_dt
|
||
|
|
||
|
## date until resource shall be considered as fresh
|
||
|
fresh_date_dt
|
||
|
|
||
|
## ids of referrer to this document
|
||
|
referrer_id_txt
|
||
|
|
||
|
## the name of the publisher of the document
|
||
|
publisher_t
|
||
|
|
||
![]()
13 years ago
|
## the language used in the document
|
||
|
language_s
|
||
![]()
13 years ago
|
|
||
|
## number of links to audio resources
|
||
|
audiolinkscount_i
|
||
|
|
||
|
## number of links to video resources
|
||
|
videolinkscount_i
|
||
|
|
||
|
## number of links to application resources
|
||
|
applinkscount_i
|
||
|
|
||
|
|
||
|
### optional but highly recommended values, not part of the index distribution process
|
||
|
|
||
![]()
13 years ago
|
## tags that are attached to crawls/index generation to separate the search result into user-defined subsets
|
||
|
collection_sxt
|
||
|
|
||
![]()
12 years ago
|
## geospatial point in degrees of latitude,longitude as declared in WSG84, location; this creates two additional subfields, coordinate_p_0_coordinate (latitude) and coordinate_p_1_coordinate (longitude)
|
||
![]()
13 years ago
|
coordinate_p
|
||
|
|
||
![]()
14 years ago
|
## content of author-tag, texgen
|
||
|
author
|
||
|
|
||
|
## content of description-tag, text
|
||
|
description
|
||
|
|
||
![]()
12 years ago
|
## flag shows if description is unique in the whole index; if yes and another document appears with same description, the unique-flag is set to false, boolean
|
||
|
#description_unique_b
|
||
|
|
||
![]()
13 years ago
|
## content of keywords tag; words are separated by space
|
||
![]()
14 years ago
|
keywords
|
||
|
|
||
|
## character encoding, string
|
||
|
charset_s
|
||
|
|
||
![]()
13 years ago
|
## number of words in visible area, int
|
||
|
wordcount_i
|
||
|
|
||
|
## total number of inbound links, int
|
||
|
inboundlinkscount_i
|
||
|
|
||
|
## number of inbound links with nofollow tag, int
|
||
|
inboundlinksnofollowcount_i
|
||
|
|
||
|
## external number of inbound links, int
|
||
|
outboundlinkscount_i
|
||
|
|
||
|
## number of external links with nofollow tag, int
|
||
|
outboundlinksnofollowcount_i
|
||
|
|
||
|
## number of images, int
|
||
|
imagescount_i
|
||
|
|
||
|
## response time of target server in milliseconds, int
|
||
|
responsetime_i
|
||
|
|
||
|
## all visible text, text
|
||
|
text_t
|
||
|
|
||
![]()
13 years ago
|
## additional synonyms to the words in the text
|
||
|
synonyms_sxt
|
||
![]()
13 years ago
|
|
||
![]()
13 years ago
|
## h1 header
|
||
![]()
13 years ago
|
h1_txt
|
||
|
|
||
![]()
13 years ago
|
## h2 header
|
||
![]()
13 years ago
|
h2_txt
|
||
|
|
||
![]()
13 years ago
|
## h3 header
|
||
![]()
13 years ago
|
h3_txt
|
||
|
|
||
![]()
13 years ago
|
## h4 header
|
||
![]()
13 years ago
|
h4_txt
|
||
|
|
||
![]()
13 years ago
|
## h5 header
|
||
![]()
13 years ago
|
h5_txt
|
||
|
|
||
![]()
13 years ago
|
## h6 header
|
||
![]()
13 years ago
|
h6_txt
|
||
|
|
||
![]()
13 years ago
|
|
||
|
### optional values, not part of standard YaCy handling (but useful for external applications)
|
||
|
|
||
![]()
13 years ago
|
## ip of host of url (after DNS lookup), string
|
||
|
#ip_s
|
||
![]()
13 years ago
|
|
||
![]()
13 years ago
|
## tags of css entries, normalized with absolute URL
|
||
![]()
13 years ago
|
#css_tag_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## urls of css entries, normalized with absolute URL
|
||
![]()
13 years ago
|
#css_url_txt
|
||
![]()
14 years ago
|
|
||
|
## number of css entries, int
|
||
![]()
13 years ago
|
#csscount_i
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## urls of script entries, normalized with absolute URL
|
||
![]()
13 years ago
|
#scripts_txt
|
||
![]()
14 years ago
|
|
||
|
## number of script entries, int
|
||
![]()
13 years ago
|
#scriptscount_i
|
||
![]()
14 years ago
|
|
||
![]()
14 years ago
|
## encoded as binary value into an integer:
|
||
|
## bit 0: "all" contained in html header meta
|
||
|
## bit 1: "index" contained in html header meta
|
||
|
## bit 2: "noindex" contained in html header meta
|
||
|
## bit 3: "nofollow" contained in html header meta
|
||
|
## bit 8: "noarchive" contained in http header properties
|
||
|
## bit 9: "nosnippet" contained in http header properties
|
||
|
## bit 10: "noindex" contained in http header properties
|
||
|
## bit 11: "nofollow" contained in http header properties
|
||
|
## bit 12: "unavailable_after" contained in http header properties
|
||
|
## content of <meta name="robots" content=#content#> tag and the "X-Robots-Tag" HTTP property
|
||
![]()
13 years ago
|
#robots_i
|
||
![]()
14 years ago
|
|
||
|
## content of <meta name="generator" content=#content#> tag, text
|
||
![]()
13 years ago
|
#metagenerator_t
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## internal links, normalized (absolute URLs), as <a> - tag with anchor text and nofollow
|
||
![]()
13 years ago
|
#inboundlinks_tag_txt
|
||
![]()
14 years ago
|
|
||
|
## internal links, only the protocol
|
||
![]()
13 years ago
|
inboundlinks_protocol_sxt
|
||
![]()
14 years ago
|
|
||
|
## internal links, the url only without the protocol
|
||
![]()
13 years ago
|
inboundlinks_urlstub_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## external links, normalized (absolute URLs), as <a> - tag with anchor text and nofollow
|
||
![]()
13 years ago
|
#outboundlinks_tag_txt
|
||
![]()
14 years ago
|
|
||
![]()
14 years ago
|
## external links, only the protocol
|
||
![]()
13 years ago
|
outboundlinks_protocol_sxt
|
||
![]()
14 years ago
|
|
||
|
## external links, the url only without the protocol
|
||
![]()
13 years ago
|
outboundlinks_urlstub_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## all image tags, encoded as <img> tag inclusive alt- and title property
|
||
![]()
13 years ago
|
#images_tag_txt
|
||
![]()
14 years ago
|
|
||
|
## all image links without the protocol and '://'
|
||
![]()
13 years ago
|
#images_urlstub_txt
|
||
![]()
14 years ago
|
|
||
|
## all image link protocols
|
||
![]()
13 years ago
|
#images_protocol_sxt
|
||
![]()
14 years ago
|
|
||
|
## all image link alt tag
|
||
![]()
13 years ago
|
#images_alt_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of image links with alt tag
|
||
|
#images_withalt_i
|
||
|
|
||
![]()
14 years ago
|
## binary pattern for the existance of h1..h6 headlines, int
|
||
![]()
13 years ago
|
#htags_i
|
||
![]()
14 years ago
|
|
||
![]()
14 years ago
|
## url inside the canonical link element, string
|
||
![]()
13 years ago
|
#canonical_t
|
||
![]()
14 years ago
|
|
||
![]()
12 years ago
|
## flag shows if the url in canonical_t is equal to sku, boolean
|
||
|
#canonical_equal_sku_b
|
||
|
|
||
![]()
13 years ago
|
## link from the url property inside the refresh link element, string
|
||
![]()
13 years ago
|
#refresh_s
|
||
![]()
13 years ago
|
|
||
![]()
13 years ago
|
## all texts in <li> tags
|
||
![]()
13 years ago
|
#li_txt
|
||
![]()
14 years ago
|
|
||
|
## number of <li> tags, int
|
||
![]()
13 years ago
|
#licount_i
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## all texts inside of <b> or <strong> tags. no doubles. listed in the order of number of occurrences in decreasing order
|
||
![]()
13 years ago
|
bold_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of occurrences of texts in bold_txt
|
||
![]()
13 years ago
|
#bold_val
|
||
![]()
14 years ago
|
|
||
|
## total number of occurrences of <b> or <strong>, int
|
||
![]()
13 years ago
|
#boldcount_i
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## all texts inside of <i> tags. no doubles. listed in the order of number of occurrences in decreasing order
|
||
![]()
13 years ago
|
italic_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of occurrences of texts in italic_txt
|
||
![]()
13 years ago
|
#italic_val
|
||
![]()
14 years ago
|
|
||
|
## total number of occurrences of <i>, int
|
||
![]()
13 years ago
|
#italiccount_i
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## all texts inside of <u> tags. no doubles. listed in the order of number of occurrences in decreasing order
|
||
|
underline_txt
|
||
|
|
||
|
## number of occurrences of texts in underline_txt
|
||
|
#underline_val
|
||
|
|
||
|
## total number of occurrences of <u>, int
|
||
|
#underlinecount_i
|
||
|
|
||
![]()
14 years ago
|
## flag that shows if a swf file is linked, boolean
|
||
![]()
13 years ago
|
#flash_b
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## list of all links to frames
|
||
![]()
13 years ago
|
#frames_txt
|
||
![]()
14 years ago
|
|
||
|
## number of attr_frames, int
|
||
![]()
13 years ago
|
#framesscount_i
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## list of all links to iframes
|
||
![]()
13 years ago
|
#iframes_txt
|
||
![]()
14 years ago
|
|
||
|
## number of attr_iframes, int
|
||
![]()
13 years ago
|
#iframesscount_i
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## the protocol of the url
|
||
![]()
13 years ago
|
url_protocol_s
|
||
![]()
13 years ago
|
|
||
|
## all path elements in the url
|
||
![]()
13 years ago
|
url_paths_sxt
|
||
![]()
13 years ago
|
|
||
![]()
13 years ago
|
## the file name extension
|
||
|
url_file_ext_s
|
||
|
|
||
![]()
13 years ago
|
## number of key-value pairs in search part of the url
|
||
|
#url_parameter_i
|
||
|
|
||
|
## the keys from key-value pairs in the search part of the url
|
||
|
#url_parameter_key_sxt
|
||
|
|
||
|
## the values from key-value pairs in the search part of the url
|
||
|
#url_parameter_value_sxt
|
||
|
|
||
|
## number of all characters in the url == length of sku field
|
||
![]()
12 years ago
|
url_chars_i
|
||
![]()
13 years ago
|
|
||
![]()
13 years ago
|
## host of the url, string
|
||
![]()
13 years ago
|
host_s
|
||
![]()
13 years ago
|
|
||
|
## the Domain Class Name, either the TLD or a combination of ccSLD+TLD if a ccSLD is used.
|
||
|
#host_dnc_s
|
||
|
|
||
|
## either the second level domain or, if a ccSLD is used, the third level domain
|
||
![]()
13 years ago
|
host_organization_s
|
||
![]()
13 years ago
|
|
||
|
## the organization and dnc concatenated with '.'
|
||
|
#host_organizationdnc_s
|
||
|
|
||
|
## the remaining part of the host without organizationdnc
|
||
|
#host_subdomain_s
|
||
|
|
||
![]()
13 years ago
|
## number of titles (counting the 'title' field) in the document
|
||
|
#title_count_i
|
||
|
|
||
|
## number of characters for each title
|
||
|
#title_chars_val
|
||
|
|
||
|
## number of words in each title
|
||
|
#title_words_val
|
||
|
|
||
|
## number of descriptions in the document. Its not counting the 'description' field since there is only one. But it counts the number of descriptions that appear in the document (if any)
|
||
|
#description_count_i
|
||
|
|
||
|
## number of characters for each description
|
||
|
#description_chars_val
|
||
|
|
||
|
## number of words in each description
|
||
|
#description_words_val
|
||
|
|
||
![]()
13 years ago
|
## number of h1..h6 header lines
|
||
|
#h1_i
|
||
|
#h2_i
|
||
|
#h3_i
|
||
|
#h4_i
|
||
|
#h5_i
|
||
|
#h6_i
|
||
|
|
||
![]()
13 years ago
|
## breadcrumbs, see http://schema.org/WebPage; this is a counter how many itemprop="breadcrumb" properties in div tags appears within a page
|
||
|
#schema_org_breadcrumb_i
|
||
|
|
||
![]()
13 years ago
|
## Open Graph Metadata field, see http://ogp.me/ns#
|
||
|
#opengraph_title_t
|
||
|
#opengraph_type_s
|
||
|
#opengraph_url_s
|
||
|
#opengraph_image_s
|
||
|
|
||
![]()
13 years ago
|
## names of cms attributes; if several are recognized then they are listen in decreasing order of number of matching criterias
|
||
![]()
13 years ago
|
#ext_cms_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of attributes that count for a specific cms in attr_cms
|
||
![]()
13 years ago
|
#ext_cms_val
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## names of ad-servers/ad-services
|
||
![]()
13 years ago
|
#ext_ads_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of attributes counts in attr_ads
|
||
![]()
13 years ago
|
#ext_ads_val
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## names of recognized community functions
|
||
![]()
13 years ago
|
#ext_community_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of attribute counts in attr_community
|
||
![]()
13 years ago
|
#ext_community_val
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## names of map services
|
||
![]()
13 years ago
|
#ext_maps_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of attribute counts in attr_maps
|
||
![]()
13 years ago
|
#ext_maps_val
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## names of tracker server
|
||
![]()
13 years ago
|
#ext_tracker_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of attribute counts in attr_tracker
|
||
![]()
13 years ago
|
#ext_tracker_val
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## names matching title expressions
|
||
![]()
13 years ago
|
#ext_title_txt
|
||
![]()
14 years ago
|
|
||
![]()
13 years ago
|
## number of matching title expressions
|
||
![]()
13 years ago
|
#ext_title_val
|