diff --git a/htroot/CrawlStartExpert_p.html b/htroot/CrawlStartExpert_p.html index d80b2076d..5000e0fd8 100644 --- a/htroot/CrawlStartExpert_p.html +++ b/htroot/CrawlStartExpert_p.html @@ -144,8 +144,8 @@ Restrict to sub-path - The filter is a regular expression - that must match with the URLs which are used to be crawled; default is 'catch all'. + The filter is a regular expression + that must match with the URLs which are used to be crawled; default is 'catch all'. Example: to allow only urls that contain the word 'science', set the filter to '.*science.*'. You can also use an automatic domain-restriction to fully crawl a single domain. @@ -156,7 +156,8 @@ - This filter must not match to allow that the page is accepted for crawling. + The filter is a regular expression + that must not match to allow that the page is accepted for crawling. The empty string is a never-match filter which should do well for most cases. If you don't know what this means, please leave this field empty.