diff --git a/htroot/ContentAnalysis_p.html b/htroot/ContentAnalysis_p.html new file mode 100644 index 000000000..433fb2ceb --- /dev/null +++ b/htroot/ContentAnalysis_p.html @@ -0,0 +1,39 @@ + + + + YaCy '#[clientname]#': Content Analysis + #%env/templates/metas.template%# + + + #%env/templates/header.template%# + #%env/templates/submenuIndexControl.template%# +

Content Analysis

+

These are document analysis attributes.

+
+
+ Double Content Detection

Double-Content detection is done using a ranking on a 'unique'-Field, named 'fuzzy_signature_unique_b'. + This field is set during parsing and is influenced by two attributes for the TextProfileSignature class.

+
+
+
+
+ This is the minimum length of a word which shall be considered as element of the signature. Should be either 2 or 3. +
+
+
+
+ The quantRate is a measurement for the number of words that take part in a signature computation. The higher the number, the less + words are used for the signature. + For minTokenLen = 2 the quantRate value should not be below 0.24; for minTokenLen = 3 the quantRate value must be not below 0.5. +
+
+
+ + +
+
+
+
+ #%env/templates/footer.template%# + +