You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
40 lines
2.4 KiB
40 lines
2.4 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<title>YaCy '#[clientname]#': Content Analysis</title>
|
|
#%env/templates/metas.template%#
|
|
</head>
|
|
<body id="ContentAnalysis_p">
|
|
#%env/templates/header.template%#
|
|
#%env/templates/submenuIndexControl.template%#
|
|
<h2>Content Analysis</h2>
|
|
<p>These are document analysis attributes.</p>
|
|
<form class="dsearch" action="ContentAnalysis_p.html" method="post" enctype="multipart/form-data">
|
|
<fieldset>
|
|
<legend>Double Content Detection</legend><p>Double-Content detection is done using a ranking on a 'unique'-Field, named 'fuzzy_signature_unique_b'.
|
|
This field is set during parsing and is influenced by two attributes for the <a href="https://lucene.apache.org/solr/5_5_2/solr-core/org/apache/solr/update/processor/TextProfileSignature.html" target="_blank">TextProfileSignature</a> class.</p>
|
|
<dl>
|
|
<dt style="width:260px"><label for="minTokenLen">minTokenLen</label></dt>
|
|
<dd style="width:360px; float:left; display:inline;" id="dd_minTokenLen">
|
|
<input name="minTokenLen" id="minTokenLen" type="text" align="right" size="10" value="#[minTokenLen]#" /><br />
|
|
This is the minimum length of a word which shall be considered as element of the signature. Should be either 2 or 3.
|
|
</dd>
|
|
<dt style="width:260px"><label for="quantRate">quantRate</label></dt>
|
|
<dd style="width:360px; float:left; display:inline;" id="dd_quantRate">
|
|
<input name="quantRate" id="quantRate" type="text" align="right" size="10" value="#[quantRate]#" /><br />
|
|
The quantRate is a measurement for the number of words that take part in a signature computation. The higher the number, the less
|
|
words are used for the signature.
|
|
For minTokenLen = 2 the quantRate value should not be below 0.24; for minTokenLen = 3 the quantRate value must be not below 0.5.
|
|
</dd>
|
|
<dt style="width:260px"></dt>
|
|
<dd style="width:360px; float:left; display:inline;">
|
|
<input type="submit" name="EnterDoublecheck" class="btn btn-primary" value="Set" />
|
|
<input type="submit" name="ResetDoublecheck" class="btn btn-primary" value="Re-Set to default" />
|
|
</dd>
|
|
</dl>
|
|
</fieldset>
|
|
</form>
|
|
#%env/templates/footer.template%#
|
|
</body>
|
|
</html>
|