<tr><tdcolspan="2"><inputtype="radio"name="range"id="rangeDomain"value="domain"onclick="document.getElementById('mustmatch').disabled=true;document.getElementById('deleteoldon').disabled=false;document.getElementById('deleteoldage').disabled=false;document.getElementById('deleteoldon').checked=true;"/>Restrict to start domain(s)</td></tr>
<tr><tdcolspan="2"><inputtype="radio"name="range"id="rangeDomain"value="domain"#(range_domain)#::checked="checked"#(/range_domain)#onclick="document.getElementById('mustmatch').disabled=true;document.getElementById('deleteoldon').disabled=false;document.getElementById('deleteoldage').disabled=false;document.getElementById('deleteoldon').checked=true;"/>Restrict to start domain(s)</td></tr>
<tr><tdcolspan="2"><inputtype="radio"name="range"id="rangeSubpath"value="subpath"onclick="document.getElementById('mustmatch').disabled=true;document.getElementById('deleteoldon').disabled=false;document.getElementById('deleteoldage').disabled=false;document.getElementById('deleteoldon').checked=true;"/>Restrict to sub-path(s)</td></tr>
<tr><tdcolspan="2"><inputtype="radio"name="range"id="rangeSubpath"value="subpath"#(range_subpath)#::checked="checked"#(/range_subpath)#onclick="document.getElementById('mustmatch').disabled=true;document.getElementById('deleteoldon').disabled=false;document.getElementById('deleteoldage').disabled=false;document.getElementById('deleteoldon').checked=true;"/>Restrict to sub-path(s)</td></tr>
Crawls can be restricted to specific countries. This uses the country code that can be computed from
Crawls can be restricted to specific countries. This uses the country code that can be computed from
the IP of the server that hosts the page. The filter is not a regular expressions but a list of country codes, separated by comma.
the IP of the server that hosts the page. The filter is not a regular expressions but a list of country codes, separated by comma.
</span></span>
</span></span>
<inputtype="radio"name="countryMustMatchSwitch"id="countryMustMatchSwitch"value="false"checked="checked"/>no country code restriction<br/>
<inputtype="radio"name="countryMustMatchSwitch"id="countryMustMatchSwitch"value="false"#(countryMustMatchSwitchChecked)#::checked="checked"#(/countryMustMatchSwitchChecked)#/>no country code restriction<br/>
After a crawl was done in the past, document may become stale and eventually they are also deleted on the target host.
After a crawl was done in the past, document may become stale and eventually they are also deleted on the target host.
To remove old files from the search index it is not sufficient to just consider them for re-load but it may be necessary
To remove old files from the search index it is not sufficient to just consider them for re-load but it may be necessary
to delete them because they simply do not exist any more. Use this in combination with re-crawl while this time should be longer.
to delete them because they simply do not exist any more. Use this in combination with re-crawl while this time should be longer.
</span></span><inputtype="radio"name="deleteold"id="deleteoldoff"value="off" checked="checked"/>Do not delete any document before the crawl is started.</dd>
</span></span><inputtype="radio"name="deleteold"id="deleteoldoff"value="off"#(deleteold_off)#::checked="checked"#(/deleteold_off)#/>Do not delete any document before the crawl is started.</dd>
<dt>Delete sub-path</dt>
<dt>Delete sub-path</dt>
<dd><inputtype="radio"name="deleteold"id="deleteoldon"value="on"disabled="true"/>For each host in the start url list, delete all documents (in the given subpath) from that host.</dd>
<dd><inputtype="radio"name="deleteold"id="deleteoldon"value="on"#(deleteold_on)#::checked="checked"#(/deleteold_on)##(range_wide)#::disabled="disabled"#(/range_wide)#/>For each host in the start url list, delete all documents (in the given subpath) from that host.</dd>
<dt>Delete only old</dt>
<dt>Delete only old</dt>
<dd><inputtype="radio"name="deleteold"id="deleteoldage"value="age"disabled="true"/>Treat documents that are loaded
<dd><inputtype="radio"name="deleteold"id="deleteoldage"value="age"#(deleteold_age)#::checked="checked"#(/deleteold_age)##(range_wide)#::disabled="disabled"#(/range_wide)#/>Treat documents that are loaded
</select> ago as stale and delete them before the crawl is started.
</select> ago as stale and delete them before the crawl is started.
</dd>
</dd>
</dl>
</dl>
@ -217,22 +226,31 @@
A web crawl performs a double-check on all links found in the internet against the internal database. If the same url is found again,
A web crawl performs a double-check on all links found in the internet against the internal database. If the same url is found again,
then the url is treated as double when you check the 'no doubles' option. A url may be loaded again when it has reached a specific age,
then the url is treated as double when you check the 'no doubles' option. A url may be loaded again when it has reached a specific age,
to use that check the 're-load' option.
to use that check the 're-load' option.
</span></span><inputtype="radio"name="recrawl"value="nodoubles"checked="checked"/>Never load any page that is already known. Only the start-url may be loaded again.</dd>
</span></span><inputtype="radio"name="recrawl"value="nodoubles"#(recrawl_nodoubles)#checked="checked"#(/recrawl_nodoubles)#/>Never load any page that is already known. Only the start-url may be loaded again.</dd>
<dt>Re-load</dt>
<dt>Re-load</dt>
<dd><inputtype="radio"name="recrawl"value="reload"/>Treat documents that are loaded
<dd><inputtype="radio"name="recrawl"value="reload"#(recrawl_reload)#checked="checked"#(/recrawl_reload)#/>Treat documents that are loaded