This is a distributed web crawler and also a caching HTTP proxy. You are using the <i>online-interface</i> of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the <i>global</i> index.
<tr><td>[abc]</td><td>: a or b or c (same as a|b|c)</td></tr>
<tr><td>[a-c]</td><td>: a or b or c (same as above)</td></tr>
<tr><td>x{n}</td><td>: exactly n appearances of x</td></tr>
<tr><td>x{n,}</td><td>: at least n appearances of x</td></tr>
<tr><td>x{n,m}</td><td>: at least n, maximum m appearanches of x</td></tr>
<tr><td>( )</td><td>: Modify priority of instructions</td></tr>
<tr><td>\</td><td>: Escape-Character, used to escape special characters (for example "[" or "*"), so that they loose their special meaning</td></tr>
</table>
<br>
<br>
Regex follow a special priority (descending): concatenation, unary operators (*,+,^,{}), binary operators (|). This can be overridden with brackets.<br>
<br>
Example:<br>
<br>
.*heise.de/.*/[0-9]+<br>
<br>
This matches heise.de/ with a string in front of it, for example "http://www.", followed by any string, then a slash and a number. The dot in "heise.de" is not escaped with "\", because it represents any character, thus the "." itself, too.<br>
A possible URL which would match this regexp is: http://www.heise.de/newsticker/meldung/59421<br>
An URL which would not match is: http://www.heise.de/tp/r4/artikel/20/20701/1.html<br>
There is ".html" at the end, which is not included with the Regular Expression.