You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
71 lines
3.3 KiB
71 lines
3.3 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<title>YaCy '#[clientname]#': Index Creation with a Web Crawl for a Single Domain</title>
|
|
#%env/templates/metas.template%#
|
|
<script type="text/javascript" src="/js/ajax.js"></script>
|
|
<script type="text/javascript" src="/js/IndexCreate.js"></script>
|
|
</head>
|
|
<body id="IndexCreate">
|
|
#%env/templates/header.template%#
|
|
#%env/templates/submenuIndexCreate.template%#
|
|
<h2>Easy Crawl Start</h2>
|
|
|
|
<p id="startCrawling">
|
|
<strong>Start Crawling Job:</strong>
|
|
You can define URLs as start points for Web page crawling and start crawling here.
|
|
"Crawling" means that YaCy will download the given web-site, extract all links in it
|
|
and then download the content behind these links.
|
|
This is repeated as long as specified under "Crawling Depth".
|
|
</p>
|
|
|
|
<form action="Crawler_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
|
|
<input type="hidden" name="crawlingFilter" value=".*" />
|
|
<input type="hidden" name="crawlingIfOlderCheck" value="off" />
|
|
<input type="hidden" name="crawlingDomFilterCheck" value="off" />
|
|
<input type="hidden" name="crawlingDomMaxCheck" value="off" />
|
|
<input type="hidden" name="crawlingQ" value="off" />
|
|
<input type="hidden" name="storeHTCache" value="on" />
|
|
<input type="hidden" name="indexText" value="on" />
|
|
<input type="hidden" name="indexMedia" value="on" />
|
|
<input type="hidden" name="crawlOrder" value="on" />
|
|
<input type="hidden" name="intention" value="simple web crawl" />
|
|
<input type="hidden" name="xsstopw" value="off" />
|
|
<table border="0" cellpadding="5" cellspacing="1">
|
|
<tr class="TableHeader">
|
|
<td><strong>Attribut</strong></td>
|
|
<td><strong>Value</strong></td>
|
|
<td><strong>Description</strong></td>
|
|
</tr>
|
|
<tr valign="top" class="TableCellSummary">
|
|
<td>Starting Point:</td>
|
|
<td>
|
|
<input name="crawlingURL" type="text" size="41" maxlength="256" value="http://" onkeypress="changed()" />
|
|
<span id="robotsOK"></span><br />
|
|
<span id="title"><br/></span>
|
|
<img src="/env/grafics/empty.gif" name="ajax" alt="empty" />
|
|
</td>
|
|
<td>
|
|
Enter here the start url of the web crawl.
|
|
</td>
|
|
</tr>
|
|
<tr valign="top" class="TableCellLight">
|
|
<td><label for="crawlingDepth">Crawling Range</label>:</td>
|
|
<td>
|
|
<input type="radio" name="range" value="wide" checked="checked" />Wide: depth <input name="crawlingDepth" id="crawlingDepth" type="text" size="2" maxlength="2" value="#[crawlingDepth]#" /> |
|
|
<input type="radio" name="range" value="domain" />Complete Domain
|
|
</td>
|
|
<td>
|
|
The range defines if the crawl shall consider a complete domain, or a wide crawl up to a specific depth.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top" class="TableCellLight">
|
|
<td colspan="3"><input type="submit" name="crawlingstart" value="Start New Distributed Crawl (will be visible at other peers)" /></td>
|
|
</tr>
|
|
</table>
|
|
</form>
|
|
|
|
#%env/templates/footer.template%#
|
|
</body>
|
|
</html> |