Look at it and if the majority says it is not as good as before we undo the changes (especially the search page is very unfamiliar).
Thanks to Philipp Redeker.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1484 6c8d7289-2bf4-0310-a012-ef5d649a1542
This is a distributed web crawler and also a caching HTTP proxy. You are using the <i>online-interface</i> of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the <i>global</i> index.
</p>
</p>
<p>
<p>
For more detailed information, visit the <ahref="http://www.yacy.net/yacy">YaCy homepage</a>.
</p>
</p>
<h3>Local and Global Search: Options and Functions</h3>
<h3>Local and Global Search: Options and Functions</h3>
The proxy provides a search interface that accesses your local index, created from web pages that passed the proxy.
The search can also be applied globally, by searching other peers. You can use the following options to enhance your search results:
You can define URLs as start points for Web page crawling and start crawling here. "Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links. This is repeated as long as specified under "Crawling Depth".
You can define URLs as start points for Web page crawling and start crawling here. "Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links. This is repeated as long as specified under "Crawling Depth".</p>
This defines how often the Crawler will follow links embedded in websites.<br>
A minimum of 1 is recommended and means that the page you enter under "Starting Point" will be added to the index, but no linked content is indexed. 2-4 is good for normal indexing.
Be careful with the depth. Consider a branching factor of average 20;
A prefetch-depth of 8 would index 25.600.000.000 pages, maybe this is the whole WWW.
A questionmark is usually a hint for a dynamic page. URLs pointing to dynamic content should usually not be crawled. However, there are sometimes web pages with static content that
is accessed with URLs containing question marks. If you are unsure, do not check this to avoid crawl loops.
This can be useful to circumvent that extremely common words are added to the database, i.e. "the", "he", "she", "it"... To exclude all words given in the file <ttclass=small>yacy.stopwords</tt> from indexing,
check this box.
</td>
@ -107,7 +116,7 @@ You can define URLs as start points for Web page crawling and start crawling her
<tdclass=small>It is almost always recommended to set this on. The only exception is that you have another caching proxy running as secondary proxy and YaCy is configured to used that proxy in proxy-proxy - mode.</td>
<center><h2><fontsize="2"face="Helvetica, Arial"color="#212942"><imgsrc="/env/grafics/kaskelix.png"align="middle"alt="YaCy logo Kaskelix"><br>P2P WEB SEARCH</font></h2></center><br>
<center><h2><fontsize="2"face="Helvetica, Arial"color="#212942"><imgsrc="/env/grafics/kaskelix.png"align="middle"alt="YaCy logo Kaskelix"><br>P2P WEB SEARCH</font></h2></center><br>