Better explanation for the auto-dom-filter.

Some javadoc.
Small change to DetailedSearch.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2146 6c8d7289-2bf4-0310-a012-ef5d649a1542
pull/1/head
rramthun 19 years ago
parent 28ff7ec214
commit bc94a714b2

@ -26,14 +26,13 @@ function checkers(name, n) {
}
</SCRIPT>
<form action="DetailedSearch.html" method="get" enctype="multipart/form-data">
<table border="0" cellpadding="5" cellspacing="0">
<tr heigth="1"><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr valign="center" class="TableCellLight">
<td><h2>Detailed&nbsp;Search</h2></td>
<td colspan="2"><input type="text" name="search" value="#[search]#" size="50" maxlength="250"></td>
<td><input type="submit" name="Enter" value="Search"></td>
</tr>
<tr valign="top" class="TableCellLight">
<br><br><h2>Detailed&nbsp;Search</h2><br>
<table border="0" cellpadding="5" cellspacing="0">
<tr valign="center">
<td colspan="2" align="right"><input type="text" name="search" value="#[search]#" size="50" maxlength="250">&nbsp;<input type="submit" name="Enter" value="Search"></td>
<td></td>
</tr>
<tr valign="top">
<td colspan="2">
<table border="0" cellpadding="0" cellspacing="1">
<tr valign="center" class="TableHeader"><td class="small" colspan="2">Local Query</td></tr>
@ -46,13 +45,13 @@ function checkers(name, n) {
<tr valign="center" class="TableCellDark">
<td class=small>Max. search time (seconds):</td>
<td class=small>
<input type="text" name="localTime" value="#[localTime]#" size="2" maxlength="3">
<input type="text" name="localTime" value="#[localTime]#" size="4" maxlength="3">
</td>
</tr>
<tr valign="center" class="TableCellDark">
<td class=small>Max. word distance:</td>
<td class=small>
<input type="text" name="localWDist" value="#[localWDist]#" size="3" maxlength="4">
<input type="text" name="localWDist" value="#[localWDist]#" size="4" maxlength="4">
</td>
</tr>

@ -65,10 +65,8 @@ You can define URLs as start points for Web page crawling and start crawling her
Use:<input type="checkbox" name="crawlingDomFilterCheck" align="top" #(crawlingDomFilterCheck)#::checked#(/crawlingDomFilterCheck)#>&nbsp;&nbsp;
Depth:<input name="crawlingDomFilterDepth" type="text" size="2" maxlength="2" value="#[crawlingDomFilterDepth]#"></td>
<td class=small>
This option will cause a creation of a domain-list during indexing. This list is filled only with domains that
appear on the given depth during crawling. The domain-list is then used to filter-out all domains, that appear
on depths greater then the given depth, but do not appear in the domain-list. You can use this option i.e.
to crawl pages with bookmarks while restricting the crawl on only those domains that appear on the bookmark-page.
This option will automatically create a domain-filter which limits the crawl on domains the crawler will find on the given depth. You can use this option i.e. to crawl a page with bookmarks while restricting the crawl on only those domains that appear on the bookmark-page. The adequate depth for this example would be 1.<br>
The default value 0 gives no restrictions.
</td>
</tr>
<tr valign="top" class="TableCellLight">

@ -497,10 +497,8 @@ Minute\(s\)==Minute(n)
If you use this option, web pages that are already existent in your database are crawled and indexed again.==Ist diese Option aktiviert, werden bereits in Ihrer Datenbank existierende Internetseite erneut gecrawled und indexiert.
It depends on the age of the last crawl if this is done or not: if the last crawl is older than the given==Es h&auml;ngt vom Alter des letzten Crawls ab, ob dies getan oder nicht getan wird: wenn der letzte Crawl &auml;lter als das angegebene
date, the page is crawled again, othervise it is treaded as 'double' and not loaded or indexed again.==Datum ist, wird die Seite erneut gecrawlet, sonst wird sie als 'double' markiert und weder geladen noch indexiert. Auto-Dom-Filter:==Auto-Dom-Filter: Depth:==Tiefe:
This option will cause a creation of a domain-list during indexing. This list is filled only with domains that==Diese Option f&uuml;hrt dazu, dass w&auml;hrend des Indexierens eine Domain-Liste erstellt wird. Diese Liste wird nur mit Domains gef&uuml;llt, die
appear on the given depth during crawling. The domain-list is then used to filter-out all domains, that appear==in der angegebenen Tiefe w&auml;hrend des Crawls auftauchen. Die Domain-Liste wird dann benutzt, um alle Domains rauszufiltern, die
on depths greater then the given depth, but do not appear in the domain-list. You can use this option i.e.==tiefer als die angegebene Tiefe vorkommen, aber nicht auf der Domain-Liste vorkommen. Sie k&ouml;nnen diese Option z.B. dazu benutzen,
to crawl pages with bookmarks while restricting the crawl on only those domains that appear on the bookmark-page.==um Seiten mit Lesezeichen zu crawlen, w&auml;hrend der Crawl auf die, nur auf diese Lesezeichen-Seite auftauchenden Domains, begrenzt ist.
This option will automatically create a domain-filter which limits the crawl on domains the crawler will find on the given depth. You can use this option i.e. to crawl a page with bookmarks while restricting the crawl on only those domains that appear on the bookmark-page. The adequate depth for this example would be 1.==Diese Option erzeugt automatisch einen Domain-Filter der den Crawl auf die Domains beschränkt , die auf der angegebenen Tiefe gefunden werden. Diese Option kann man beispielsweise benutzen, um eine Seite mit Bookmarks zu crawlen und dann den folgenden Crawl automatisch auf die Domains zu beschränken, die in der Bookmarkliste vorkamen. Die einzustellende Tiefe für dieses Beispiel wäre 1.
The default value 0 gives no restrictions.==Der Vorgabewert 0 bedeutet, dass nichts eingeschränkt wird.
Maximum Pages per Domain:==Maximale Seiten pro Domain:
Page-Count:==Seitenanzahl:
You can limit the maxmimum number of pages that are fetched and indexed from a single domain with this option.==Sie k&ouml;nnen die maximale Anzahl an Seiten, die von einer einzelnen Domain gefunden und indexiert werden, mit dieser Option begrenzen.

@ -130,7 +130,7 @@ public final class plasmaCrawlLoader extends Thread {
// interrupting the plasmaCrawlLoader
this.interrupt();
// waiting for the thread to finish ...
// waiting for the thread to finish...
this.log.logInfo("Waiting for plasmaCrawlLoader shutdown ...");
this.join(5000);
} catch (Exception e) {
@ -162,7 +162,7 @@ public final class plasmaCrawlLoader extends Thread {
}
}
// consuming the is interrupted flag
// consuming the "is interrupted"-flag
this.isInterrupted();
// closing the pool
@ -170,7 +170,6 @@ public final class plasmaCrawlLoader extends Thread {
this.crawlwerPool.close();
}
catch (Exception e) {
// TODO Auto-generated catch block
this.log.logSevere("plasmaCrawlLoader.run/close", e);
}
@ -194,7 +193,6 @@ public final class plasmaCrawlLoader extends Thread {
try {
this.theQueue.addMessage(theMsg);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
this.log.logSevere("plasmaCrawlLoader.loadParallel", e);
}
}

@ -64,7 +64,17 @@ import java.util.Hashtable;
import java.util.Iterator;
public final class serverFileUtils {
/**
* Copies an InputStream to an OutputStream.
* @param source InputStream
* @param dest OutputStream
* @return Total number of bytes copied.
* @see copy(InputStream source, File dest)
* @see copyRange(File source, OutputStream dest, int start)
* @see copy(File source, OutputStream dest)
* @see copy(File source, File dest)
*/
public static int copy(InputStream source, OutputStream dest) throws IOException {
byte[] buffer = new byte[4096];
@ -78,7 +88,16 @@ public final class serverFileUtils {
return total;
}
/**
* Copies an InputStream to a File.
* @param source InputStream
* @param dest File
* @see copy(InputStream source, OutputStream dest)
* @see copyRange(File source, OutputStream dest, int start)
* @see copy(File source, OutputStream dest)
* @see copy(File source, File dest)
*/
public static void copy(InputStream source, File dest) throws IOException {
FileOutputStream fos = null;
try {
@ -88,7 +107,17 @@ public final class serverFileUtils {
if (fos != null) try {fos.close();} catch (Exception e) {}
}
}
/**
* Copies a part of a File to an OutputStream.
* @param source File
* @param dest OutputStream
* @param start Number of bytes to skip from the beginning of the File
* @see copy(InputStream source, OutputStream dest)
* @see copy(InputStream source, File dest)
* @see copy(File source, OutputStream dest)
* @see copy(File source, File dest)
*/
public static void copyRange(File source, OutputStream dest, int start) throws IOException {
InputStream fis = null;
try {
@ -99,18 +128,36 @@ public final class serverFileUtils {
} finally {
if (fis != null) try { fis.close(); } catch (Exception e) {}
}
}
}
/**
* Copies a File to an OutputStream.
* @param source File
* @param dest OutputStream
* @see copy(InputStream source, OutputStream dest)
* @see copy(InputStream source, File dest)
* @see copyRange(File source, OutputStream dest, int start)
* @see copy(File source, File dest)
*/
public static void copy(File source, OutputStream dest) throws IOException {
InputStream fis = null;
InputStream fis = null;
try {
fis = new FileInputStream(source);
copy(fis, dest);
fis = new FileInputStream(source);
copy(fis, dest);
} finally {
if (fis != null) try { fis.close(); } catch (Exception e) {}
}
}
/**
* Copies a File to a File.
* @param source File
* @param dest File
* @see copy(InputStream source, OutputStream dest)
* @see copy(InputStream source, File dest)
* @see copyRange(File source, OutputStream dest, int start)
* @see copy(File source, OutputStream dest)
*/
public static void copy(File source, File dest) throws IOException {
FileInputStream fis = null;
FileOutputStream fos = null;
@ -130,7 +177,7 @@ public final class serverFileUtils {
baos.close();
return baos.toByteArray();
}
public static byte[] read(File source) throws IOException {
byte[] buffer = new byte[(int) source.length()];
InputStream fis = null;
@ -143,7 +190,7 @@ public final class serverFileUtils {
}
return buffer;
}
public static byte[] readAndZip(File source) throws IOException {
ByteArrayOutputStream byteOut = null;
GZIPOutputStream zipOut = null;
@ -158,7 +205,7 @@ public final class serverFileUtils {
if (byteOut != null) try { byteOut.close(); } catch (Exception e) {}
}
}
public static void writeAndGZip(byte[] source, File dest) throws IOException {
FileOutputStream fos = null;
try {
@ -168,7 +215,7 @@ public final class serverFileUtils {
if (fos != null) try {fos.close();} catch (Exception e) {}
}
}
public static void writeAndGZip(byte[] source, OutputStream dest) throws IOException {
GZIPOutputStream zipOut = null;
try {
@ -179,15 +226,15 @@ public final class serverFileUtils {
if (zipOut != null) try { zipOut.close(); } catch (Exception e) {}
}
}
public static void write(byte[] source, OutputStream dest) throws IOException {
copy(new ByteArrayInputStream(source), dest);
}
public static void write(byte[] source, File dest) throws IOException {
copy(new ByteArrayInputStream(source), dest);
}
public static HashSet loadList(File file) {
HashSet set = new HashSet();
BufferedReader br = null;
@ -240,7 +287,7 @@ public final class serverFileUtils {
file.delete();
tf.renameTo(file);
}
public static Set loadSet(File file, int chunksize, boolean tree) throws IOException {
Set set = (tree) ? (Set) new TreeSet() : (Set) new HashSet();
byte[] b = read(file);
@ -278,14 +325,19 @@ public final class serverFileUtils {
file.delete();
tf.renameTo(file);
}
/**
* Moves all files from a directory to another.
* @param from_dir Directory which contents will be moved.
* @param to_dir Directory to move into. It must exist already.
*/
public static void moveAll(File from_dir, File to_dir) {
if (!(from_dir.isDirectory())) return;
if (!(to_dir.isDirectory())) return;
String[] list = from_dir.list();
for (int i = 0; i < list.length; i++) (new File(from_dir, list[i])).renameTo(new File(to_dir, list[i]));
}
public static void main(String[] args) {
try {
writeAndGZip("ein zwei drei, Zauberei".getBytes(), new File("zauberei.txt.gz"));

@ -167,8 +167,8 @@ public final class yacy {
* combined version
*
* @param version Current given version.
* @param svn Current version given from svn.
* @return String with the combined version
* @param svn Current version given from SVN.
* @return String with the combined version.
*/
public static float versvn2combinedVersion(float v, int svn) {
return (float) (((double) v * 100000000.0 + ((double) svn)) / 100000000.0);
@ -550,7 +550,7 @@ public final class yacy {
}
/**
* Call the shutdown-page from yacy to tell it to shut down. This method is
* Call the shutdown-page of YaCy to tell it to shut down. This method is
* called if you start yacy with the argument -shutdown.
*
* @param homePath Root-path where all the information is to be found.
@ -1170,10 +1170,12 @@ public final class yacy {
serverLog.logInfo("TRANSFER-CR", "could not read file " + crfile);
}
}
/**
/**
* Generates a text file containing all domains in this peer's DB.
* This may be useful to calculate the YaCy-Blockrank.
*
* @param format String which determines format of the text file. Possible values: "html", "zip", "gzip" or "plain"
* @param format String which determines the format of the file. Possible values: "html", "zip", "gzip" or "plain"
* @see urllist
*/
private static void domlist(String homePath, String format, String targetName) {
@ -1364,9 +1366,9 @@ public final class yacy {
}
/**
* Searching for peers affected by Bug http://www.yacy-forum.de/viewtopic.php?p=16056
* Searching for peers affected by Bug documented in <a href="http://www.yacy-forum.de/viewtopic.php?p=16056#16056">YaCy-Forum Posting 16056</a>
* @param homePath
* @see http://www.yacy-forum.de/viewtopic.php?p=16056
* @see <a href="http://www.yacy-forum.de/viewtopic.php?p=16056#16056">YaCy-Forum Posting 16056</a>
*/
public static void testPeerDB(String homePath) {

Loading…
Cancel
Save