A <ahref="http://en.wikipedia.org/wiki/Heuristic">heuristic</a> is an 'experience-based technique that help in problem solving, learning and discovery' (wikipedia). The search heuristics that can be switched on here are techniques that help the discovery of possible search results based on link guessing, in-search crawling and requests to other search engines.
When a search heuristic is used, the resulting links are not used directly as search result but the loaded pages are indexed and stored like other content. This ensures that blacklists can be used and that the searched word actually appears on the page that was discovered by the heuristic.
The success of heuristics are marked with an image (<imgwidth="16"height="9"src="/env/grafics/heuristic_redundant.gif"title="heuristic:<name> (redundant)"style="width:16px; height:9px;"alt="heuristic:<name> (redundant)"/>/<imgwidth="16"height="9"src="/env/grafics/heuristic_new.gif"title="heuristic:<name> (new link)"style="width:16px; height:9px;"alt="heuristic:<name> (new link)"/>) below the favicon left from the search result entry:
The search result was discovered by a heuristic, but the link was already known by YaCy
</dd>
<dt>
<imgwidth="16"height="9"src="/env/grafics/heuristic_new.gif"title="heuristic:<name> (new link)"style="width:16px; height:9px;"alt="heuristic:<name> (new link)"/>
</dt>
<dd>
The search result was discovered by a heuristic, not previously known by YaCy
When a search is made using a 'site'-operator (like: 'download site:yacy.net') then the host of the site-operator is instantly crawled with a host-restricted depth-1 crawl.
That means: right after the search request the portal page of the host is loaded and every page that is linked on this page that points to a page on the same host.
Because this 'instant crawl' must obey the robots.txt and a minimum access time for two consecutive pages, this heuristic is rather slow, but may discover all wanted search results using a second search (after a small pause of some seconds).
<labelfor="opensearch">opensearch load external search result list from active systems below</label>
</legend>
<p>
When using this heuristic, then every search request line is used for a call to listed opensearch systems until enough results to fill the current search page are available.
20 results are taken from remote system and loaded simultanously, parsed and indexed immediately.
To find out more about OpenSearch see <ahref="http://www.opensearch.org"target="_blank">OpenSearch.org</a>
<inputtype="submit"name="discoverosd"id="discoverosd"value="discover from index"class="submitready"onclick="return confirm('start background task, depending on index size this may run a long time')"/>
With the button "discover from index" you can search within the metadata of your local index (Web Structure Index) to find systems which support the Opensearch specification.
The task is started in the background. It may take some minutes before new entries appear (after refreshing the page).
Alternatively you may <ahref="?copydefaultosdconfig=">copy & paste a example config file</a> located in <i>defaults/heuristicopensearch.conf</i> to the DATA/SETTINGS directory.
For the discover function the <i>web graph</i> option of the web structure index and the fields <i>target_rel_s, target_protocol_s, target_urlstub_s</i> have to be switched on in the <ahref="IndexSchema_p.html?core=webgraph">webgraph Solr schema</a>.