You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yacy_search_server/htroot/ConfigHeuristics_p.html

90 lines
6.1 KiB

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>YaCy '#[clientname]#': Heuristics Configuration</title>
#%env/templates/metas.template%#
</head>
<body id="ConfigNetwork">
#%env/templates/header.template%#
#%env/templates/submenuConfig.template%#
<h2>Heuristics Configuration</h2>
<p>
A <a href="http://en.wikipedia.org/wiki/Heuristic">heuristic</a> is an 'experience-based technique that help in problem solving, learning and discovery' (wikipedia). The search heuristics that can be switched on here are techniques that help the discovery of possible search results based on link guessing, in-search crawling and requests to other search engines.
When a search heuristic is used, the resulting links are not used directly as search result but the loaded pages are indexed and stored like other content. This ensures that blacklists can be used and that the searched word actually appears on the page that was discovered by the heuristic.
</p>
<form action=""><fieldset>
The success of heuristics are marked with an image (<img width="16" height="9" src="/env/grafics/heuristic_redundant.gif" title="heuristic:&lt;name&gt; (redundant)" style="width:16px; height:9px;" alt="heuristic:&lt;name&gt; (redundant)"/>/<img width="16" height="9" src="/env/grafics/heuristic_new.gif" title="heuristic:&lt;name&gt; (new link)" style="width:16px; height:9px;" alt="heuristic:&lt;name&gt; (new link)"/>) below the favicon left from the search result entry:
<dl>
<dt>
<img width="16" height="9" src="/env/grafics/heuristic_redundant.gif" title="heuristic:&lt;name&gt; (redundant)" style="width:16px; height:9px;" alt="heuristic:&lt;name&gt; (redundant)"/>
</dt>
<dd>
The search result was discovered by a heuristic, but the link was already known by YaCy
</dd>
<dt>
<img width="16" height="9" src="/env/grafics/heuristic_new.gif" title="heuristic:&lt;name&gt; (new link)" style="width:16px; height:9px;" alt="heuristic:&lt;name&gt; (new link)"/>
</dt>
<dd>
The search result was discovered by a heuristic, not previously known by YaCy
</dd>
</dl></fieldset></form>
<form id="HeuristicFormSite" method="post" action="ConfigHeuristics_p.html" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset>
<legend>
<input type="checkbox" name="site_check" id="site" onclick="window.location.href='ConfigHeuristics_p.html?#(site.checked)#site_on=::site_off=#(/site.checked)#'" value="site"#(site.checked)#:: checked="checked"#(/site.checked)# />
<label for="site">'site'-operator: instant shallow crawl</label>
</legend>
<p>
When a search is made using a 'site'-operator (like: 'download site:yacy.net') then the host of the site-operator is instantly crawled with a host-restricted depth-1 crawl.
That means: right after the search request the portal page of the host is loaded and every page that is linked on this page that points to a page on the same host.
Because this 'instant crawl' must obey the robots.txt and a minimum access time for two consecutive pages, this heuristic is rather slow, but may discover all wanted search results using a second search (after a small pause of some seconds).
</p>
</fieldset>
</form>
<form id="HeuristicFormSearchResult" method="post" action="ConfigHeuristics_p.html" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset>
<table>
<tr>
<td>
<legend>
<input type="checkbox" name="searchresult_check" id="searchresult" onclick="window.location.href='ConfigHeuristics_p.html?#(searchresult.checked)#searchresult_on=::searchresult_off=#(/searchresult.checked)#'" value="searchresult"#(searchresult.checked)#:: checked="checked"#(/searchresult.checked)# />
<label for="searchresult">search-result: shallow crawl on all displayed search results</label>
</legend>
</td>
<td>
<legend>
<input type="checkbox" name="searchresultglobal_check" id="searchresultglobal" onclick="window.location.href='ConfigHeuristics_p.html?#(searchresultglobal.checked)#searchresultglobal_on=::searchresultglobal_off=#(/searchresultglobal.checked)#'" value="siteresultglobal"#(searchresultglobal.checked)#:: checked="checked"#(/searchresultglobal.checked)# />
<label for="searchresultglobal">add as global crawl job</label>
</legend>
</td>
</tr>
</table>
<p>
When a search is made then all displayed result links are crawled with a depth-1 crawl.
This means: right after the search request every page is loaded and every page that is linked on this page.
If you check 'add as global crawl job' the pages to be crawled are added to the global crawl queue (remote peers can pickup pages to be crawled).
Default is to add the links to the local crawl queue (your peer crawls the linked pages).
</p>
</fieldset>
</form>
<form id="HeuristicFormBlekko" method="post" action="ConfigHeuristics_p.html" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset>
<legend>
<input type="checkbox" name="blekko_check" id="blekko" onclick="window.location.href='ConfigHeuristics_p.html?#(blekko.checked)#blekko_on=::blekko_off=#(/blekko.checked)#'" value="blekko"#(blekko.checked)#:: checked="checked"#(/blekko.checked)# />
<label for="blekko">blekko: load external search result list from <a href="http://blekko.com">blekko</a></label>
</legend>
<p>
When using this heuristic, then every search request line is used for a call to blekko.
20 results are taken from blekko and loaded simultanously, parsed and indexed immediately.
</p>
</fieldset>
</form>
#%env/templates/footer.template%#
</body>
</html>