#%env/templates/metas.template%#
#%env/templates/header.template%# #%env/templates/submenuIndexCreate.template%#Start Crawling Job: You can define URLs as start points for Web page crawling and start crawling here. "Crawling" means that YaCy will download the given website, extract all links in it and then download the content behind these links. This is repeated as long as specified under "Crawling Depth".
#(error)#
::
Error with profile management. Please stop YaCy, delete the file DATA/PLASMADB/crawlProfiles0.db and restart.
::
Error: #[errmsg]#
::
Application not yet initialized. Sorry. Please wait some seconds and repeat the request.
::
ERROR: Crawl filter "#[newcrawlingfilter]#" does not match with crawl root "#[crawlingStart]#". Please try again with different filter.
::
Crawling of "#[crawlingURL]#" failed. Reason: #[reasonString]#
::
Error with URL input "#[crawlingStart]#": #[error]#
::
Error with file input "#[crawlingStart]#": #[error]#
#(/error)#
#(info)#
::
Set new prefetch depth to "#[newproxyPrefetchDepth]#"
::
Crawling of "#[crawlingURL]#" started.
You can monitor the crawling progress either by watching the URL queues
(local queue,
global queue,
loader queue,
indexing queue)
or see the fill/process count of all queues on the
performance page.
Please wait some seconds, because the request is enqueued and delayed until the proxy/HTTP-server is idle for a certain time.
The indexing results are presented on the
Index Monitor-page.
It will take at least 30 seconds until the first result appears there. Please be patient, the crawling will pause each time you use the proxy or web server to ensure maximum availability.
If you crawl any un-wanted pages, you can delete them here.
::
Removed #[numEntries]# entries from crawl queue. This queue may fill again if the loading and indexing queue is not empty.
::
Crawling paused successfully.
::
Continue crawling.
#(/info)#
Crawl Profile List:
Crawl Thread | Start URL | Depth | Filter | MaxAge | Auto Filter Depth | Auto Filter Content | Max Page Per Domain | Accept "?" URLs | Fill Proxy Cache | Local Indexing | Remote Indexing |
#[name]# | #[startURL]# | #[depth]# | #[filter]# | #[crawlingIfOlder]# | #[crawlingDomFilterDepth]# | #[crawlingDomFilterContent]# | #[crawlingDomMaxPages]# | #(withQuery)#no::yes#(/withQuery)# | #(storeCache)#no::yes#(/storeCache)# | #(localIndexing)#no::yes#(/localIndexing)# | #(remoteIndexing)#no::yes#(/remoteIndexing)# |
Recently started remote crawls in progress:
Start Time | Peer Name | Start URL | Intention/Description | Depth | Accept '?' URLs |
#[cre]# | #[peername]# | #[startURL]# | #[intention]# | #[generalDepth]# | #(crawlingQ)#no::yes#(/crawlingQ)# |
Recently started remote crawls, finished:
Start Time | Peer Name | Start URL | Intention/Description | Depth | Accept '?' URLs |
#[cre]# | #[peername]# | #[startURL]# | #[intention]# | #[generalDepth]# | #(crawlingQ)#no::yes#(/crawlingQ)# |
Remote Crawling Peers:
#(remoteCrawlPeers)#No remote crawl peers availible.
::#[num]# peers available for remote crawling.
Idle Peers | #{available}##[name]# (#[due]# seconds due) #{/available}# |
---|---|
Busy Peers | #{busy}##[name]# (#[due]# seconds due) #{/busy}# |