diff --git a/htroot/Blacklist_p.html b/htroot/Blacklist_p.html index 33c7cf7cd..9cd9b6b71 100644 --- a/htroot/Blacklist_p.html +++ b/htroot/Blacklist_p.html @@ -13,103 +13,144 @@ from being loaded. You can define several blacklists and activate them separatel You may also provide your blacklist to other peers by sharing them; in return you may collect blacklist entries from other peers.
-- |
#(status)# :: diff --git a/htroot/EditProfile_p.html b/htroot/EditProfile_p.html index ded879c5d..6f7c38718 100644 --- a/htroot/EditProfile_p.html +++ b/htroot/EditProfile_p.html @@ -15,56 +15,60 @@ You do not need to provide any personal data here, but if you want to distribute
+ | |||
Name | -+ | Name | +|
Nick Name | -+ | Nick Name | +|
Homepage | -+ | Homepage | +|
+ | |||
+ | |||
ICQ | -+ | ICQ | +|
Jabber | -+ | Jabber | +|
Yahoo! | -+ | Yahoo! | +|
MSN | -+ | MSN | +|
+ | |||
Comment | -+ | Comment | +|
+ |
+
This is a distributed web crawler and also a caching HTTP proxy. You are using the online-interface of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the global index. -
+ -+
For more detailed information, visit the YaCy homepage. -
+ -+
Search Word List | + |
+ Search Word List + | +You can search for several words simultanous. Words must be separated by a single space. The words are treated conjunctive, that means every must occur in the result, not any. If you do a global search (see below) you may get different results each time you do a search. - |
Maximum Number of Results | + | +
+ Maximum Number of Results + | +You can select the number of wanted maximum links. We do not yet support multiple result pages for virtually any possible link. Instead we encourage you to enhance the search result by submitting more search words. - |
Result Order Options | + | +
+ Result Order Options + | +The search engine provides an experimental 'Quality' ranking. In contrast to other known search engines we provide also a result order by date. If you change the order to 'Date-Quality' the most recently updated page from the search results is listed first. For pages that have the same date the second order, 'Quality' is applied. - |
Resource Domain | + | +
+ Resource Domain + | +This search engine is constructed to search the web pages that pass the proxy. But the search index is distributed to other peers as well, so you can search also globally: this function is currently only rudimentary, but can be choosen for test cases. Future releases will automatically distribute index information before a search happends to form a performant distributed hash table -- a very fast global search. - |
Maximum Search Time | + | +
+ Maximum Search Time + | +Searching the local index is extremely fast, it happends within milliseconds, even for a large number (millions) of pages. But searching the global index needs more time to find the correct remote peer that contains best search results. This is especially the case while the distributed index is in test mode. Search results get more stable (repeated global search produce more similar results) the longer the search time is. - |
diff --git a/htroot/IndexCreate_p.html b/htroot/IndexCreate_p.html index 5d5b348d8..ef21cf9d8 100644 --- a/htroot/IndexCreate_p.html +++ b/htroot/IndexCreate_p.html @@ -12,29 +12,30 @@
- | - | + | + | + | ||||||||||||||||||||||||||||||||||||||||
Crawling Depth: | - | + |
This defines how often the Crawler will follow links embedded in websites. A minimum of 1 is recommended and means that the page you enter under "Starting Point" will be added to the index, but no linked content is indexed. 2-4 is good for normal indexing. Be careful with the depth. Consider a branching factor of average 20; A prefetch-depth of 8 would index 25.600.000.000 pages, maybe this is the whole WWW. |
|||||||||||||||||||||||||||||||||||||||||
Crawling Filter: | - | + | This is an emacs-like regular expression that must match with the URLs which are used to be crawled. Use this i.e. to crawl a single domain. If you set this filter it would make sense to increase the crawling depth. @@ -43,15 +44,15 @@ You can define URLs as start points for Web page crawling and start crawling her | |||||||||||||||||||||||||||||||||||||||||
Accept URLs with '?' / dynamic URLs: | - | + | A questionmark is usually a hint for a dynamic page. URLs pointing to dynamic content should usually not be crawled. However, there are sometimes web pages with static content that is accessed with URLs containing question marks. If you are unsure, do not check this to avoid crawl loops. | |||||||||||||||||||||||||||||||||||||||||
Store to Proxy Cache: | - | + | This option is used by default for proxy prefetch, but is not needed for explicit crawling. We recommend to leave this switched off unless you want to control the crawl results with the Cache Monitor. @@ -60,19 +61,27 @@ You can define URLs as start points for Web page crawling and start crawling her | |||||||||||||||||||||||||||||||||||||||||
Do Local Indexing: | - | + | This enables indexing of the wepages the crawler will download. This should be switched on by default, unless you want to crawl only to fill the Proxy Cache without indexing. | |||||||||||||||||||||||||||||||||||||||||
Do Remote Indexing: | -- Describe your intention to start this global crawl (optional): - - This message will appear in the 'Other Peer Crawl Start' table of other peers. - |
- + |
+
|
+ If checked, the crawler will contact other peers and use them as remote indexers for your crawl. If you need your crawling results locally, you should switch this off. Only senior and principal peers can initiate or receive remote crawls. @@ -82,7 +91,7 @@ You can define URLs as start points for Web page crawling and start crawling her | ||||||||||||||||||||||||||||||||||||||||
Exclude static Stop-Words | - | + | This can be useful to circumvent that extremely common words are added to the database, i.e. "the", "he", "she", "it"... To exclude all words given in the file yacy.stopwords from indexing, check this box. | @@ -107,7 +116,7 @@ You can define URLs as start points for Web page crawling and start crawling her -->|||||||||||||||||||||||||||||||||||||||||
Starting Point: | -+ |
|
- Existing start URLs are re-crawled. + | Existing start URLs are re-crawled. Other already visited URLs are sorted out as "double". A complete re-crawl will be available soon. | ||||||||||||||||||||||||||||||||||||||||
Accept remote crawling requests and perform crawl at maximum load | -||
@@ -155,9 +166,13 @@ Your peer can search and index for other peers and they can search for you. | Do not accept remote crawling requests (please set this only if you cannot accept to crawl only one page per minute; see option above) | -|
- | +||
+ + | ++ |
-You can change the language of the YaCy-webinterface with translation files.
-
-
-Current language: default(english)
-Languagefile Author:
-Send additions to maintainer:
-
- +
++ You can change the language of the YaCy-webinterface with translation files. +
+
+ | |
+ Current language: + | +default(english) + | +
+ Languagefile Author: + | ++ | +
+ Send additions to maintainer: + | ++ | +
+ Languages: + | +
- + |
+
+Install new language from URL: + |
+
Use this language + |
+
Proxy pre-fetch setting:
this is an automated html page loading procedure that takes actual proxy-requested
URLs as crawling start points for crawling. |
|||||
Prefetch Depth | @@ -34,28 +34,28 @@ URLs as crawling start points for crawling. | embedded URLs, but since embedded image links are loaded by the browser this means that only embedded href-anchors are prefetched additionally.||||
Store to Cache | It is almost always recommended to set this on. The only exception is that you have another caching proxy running as secondary proxy and YaCy is configured to used that proxy in proxy-proxy - mode. | ||||
Proxy generally |
|||||
Path | The path where the pages are stored (max. length 300) | ||||
Size | The size in MB of the cache. | ||||
+ | |
+Current skin: + | +#[currentskin]# + | +
+Skins: + | +
|
+
+Install new skin from URL: + |
+
Use this skin - + #(status)# :: Unable to get URL: #[url]# :: Error saving the skin. #(status)# - + |
+
-
-
-
+
+ |
+ |||||
+ Max. number of results: + | ++ + | +||||
+ order by: + | ++ + | +||||
+ Resource: + | ++ + | +||||
+ Max. search time (seconds): + | ++ + | +||||
+ URL mask: + | ++ #(urlmaskoptions)# + + :: + restrict on + show all + #(/urlmaskoptions)# + | +
#(excluded)# :: The following words are stop-words and had been excluded from the search: #[stopwords]#.