performance setting for remote indexing configuration and latest changes for 0.39

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@424 6c8d7289-2bf4-0310-a012-ef5d649a1542
pull/1/head
orbiter 20 years ago
parent 6c13cf5b0c
commit be1f324fca

@ -3,11 +3,11 @@ javacSource=1.4
javacTarget=1.4 javacTarget=1.4
# Release Configuration # Release Configuration
releaseVersion=0.387 releaseVersion=0.39
releaseFile=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz #releaseFile=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz
#releaseFile=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz releaseFile=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz
releaseDir=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr} #releaseDir=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}
#releaseDir=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr} releaseDir=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}
releaseNr=$Revision$ releaseNr=$Revision$
# defining some file/directory access rights # defining some file/directory access rights

@ -322,8 +322,6 @@
<include name="yacy.logging"/> <include name="yacy.logging"/>
<include name="yacy.init"/> <include name="yacy.init"/>
<include name="yacy.yellow"/> <include name="yacy.yellow"/>
<include name="yacy.black"/>
<include name="yacy.blue"/>
<include name="yacy.stopwords"/> <include name="yacy.stopwords"/>
<include name="yacy.parser"/> <include name="yacy.parser"/>
<include name="httpd.mime"/> <include name="httpd.mime"/>

@ -43,6 +43,42 @@ globalheader();
</ul> </ul>
--> -->
<br><p>v0.39_20050722_424
<ul>
<li>New Features:</li>
<ul>
<li>Added snippets to search results. Snippets are fetched by searching peer from original web sites and are also transported during result transmission from remote search results.</li>
<li>Proxy shows now an error page in case of errors.</li>
<li>Preparation for localization: started (not finished) German translation</li>
<li>Status page shows now memory amount, transfer volume and indexing speed as PPM (pages per minute). A global PPM (sum over all peers) is also computed.</li>
<li>Re-Structuring of Index-Creation Menue: added more submenues and queue monitors</li>
<li>Added feature to start crawling on bookmark files</li>
<li>Added blocking of blacklistet urls in indexReceive (remote DHT index transmissions)</li>
<li>Added port forwarding for remote peer connections (the peer may now be connected to an configurable address)</li>
<li>Added bbCode for Profiles</li>
<li>Memory Management in Performance Menu: a memory-limit can be set as condition for queue execution.</li>
<li>Added option to do performance-limited remote crawls (use this instead to switch off remote indexing if you are scared about too much performance loss on your machine)</li>
<li>Enhanced logging, configuration with yacy.logging</li>
</ul>
<li>Performance: enhanced indexing speed</li>
<ul>
<li>Implemented indexing/loading multithreading</li>
<li>Enhanced caching in database (less memory occupation)</li>
<li>Replaced RAM-queue after indexing by a file-based queue (makes long queues possible)</li>
<li>Changed assortment cache-flush procedure: words may now appear in any assortment, not only one assortment. This prevents assortment-flushes, increases the capacity and prevents creation of files in DATA/PLASMADB/WORDS, which further speeds up indexing.</li>
<li>Speed-up of start-up and shut-down by replacement of stack by array. The dumped index takes also less memory on disk now. Because dumping is faster, the cache may be bigger which also increases indexing speed.</li>
</ul>
<li>Bugfixes:</li>
<ul>
<li>Better shut-down behavior, time-out on sockets, less exceptions</li>
<li>Fixed gzip decoding and content-length in http-client</li>
<li>Better httpd header validation</li>
<li>Fixed possible memory leaks</li>
<li>Fixed 100% CPU bug (caused by repeated GC when memory was low)</li>
<li>Fixed UTF8-decoding for parser</li>
</ul>
</ul>
<br><p>v0.38_20050603_208 <br><p>v0.38_20050603_208
<ul> <ul>
<li>Enhanced Crawling: <li>Enhanced Crawling:

@ -28,6 +28,7 @@ the P2P-based index distribution was designed and implemented by <b>Michael Pete
<li><b>Alexander Schier</b> did much alpha-testing, gave valuable feed-back on my ideas and suggested his own. He suggested and implemented large parts of the popular blacklist feature. He supplied the 'Log'-menu function, the skin-feature, many minor changes, bug fixes and the Windows-Installer - version of YaCy. Alex also provides and maintaines the <a href="http://www.suma-lab.de/yacy/">german documentation</a> for yacy.</li> <li><b>Alexander Schier</b> did much alpha-testing, gave valuable feed-back on my ideas and suggested his own. He suggested and implemented large parts of the popular blacklist feature. He supplied the 'Log'-menu function, the skin-feature, many minor changes, bug fixes and the Windows-Installer - version of YaCy. Alex also provides and maintaines the <a href="http://www.suma-lab.de/yacy/">german documentation</a> for yacy.</li>
<li><b>Martin Thelian</b> made system-wide performance enhancement by introducing thread pools. He provided a plug-in system for external text parser and integrated many parser libraries such as pdf and word format parsers. Martin also extended and enhanced the http and proxy protocol towards a rfc-clean implementation.</li> <li><b>Martin Thelian</b> made system-wide performance enhancement by introducing thread pools. He provided a plug-in system for external text parser and integrated many parser libraries such as pdf and word format parsers. Martin also extended and enhanced the http and proxy protocol towards a rfc-clean implementation.</li>
<li><b>Roland Ramthun</b> owns and administrates the <a href="http://www.yacy-forum.de/">German YaCy-Forum</a>. He also cares for correct English spelling and a German translation of the YaCy user interface. Roland and other forum participants extended the PHPForum code to make it possible to track development feature requests and bug reports with status codes and editor flags.</li> <li><b>Roland Ramthun</b> owns and administrates the <a href="http://www.yacy-forum.de/">German YaCy-Forum</a>. He also cares for correct English spelling and a German translation of the YaCy user interface. Roland and other forum participants extended the PHPForum code to make it possible to track development feature requests and bug reports with status codes and editor flags.</li>
<li><b>Marc Nause</b> made enhancements to the Message- and User-Profile menues and functions.</li>
<li><b>Natali Christen</b> designed the YaCy logo.</li> <li><b>Natali Christen</b> designed the YaCy logo.</li>
<li><b>Thomas Quella</b> designed the Kaskelix mascot.</li> <li><b>Thomas Quella</b> designed the Kaskelix mascot.</li>
<li><b>Wolfgang Sander-Beuermann</b>, executive board member of the German search-engine association <a href="http://www.suma-ev.de/">SuMa-eV</a> <li><b>Wolfgang Sander-Beuermann</b>, executive board member of the German search-engine association <a href="http://www.suma-ev.de/">SuMa-eV</a>

@ -133,11 +133,28 @@ Crawling and indexing can be done by remote peers.
Your peer can search and index for other peers and they can search for you.</div> Your peer can search and index for other peers and they can search for you.</div>
<table border="0" cellpadding="5" cellspacing="0" width="100%"> <table border="0" cellpadding="5" cellspacing="0" width="100%">
<tr valign="top" class="TableCellDark"> <tr valign="top" class="TableCellDark">
<td width="30%"> <td width="10%">
<input type="checkbox" name="crawlResponse" align="top" #(crawlResponseChecked)#::checked#(/crawlResponseChecked)#> <input type="radio" name="dcr" value="acceptCrawlMax" align="top" #(acceptCrawlMaxChecked)#::checked#(/acceptCrawlMaxChecked)#>
Accept remote crawling requests</td> </td><td>
<td> Accept remote crawling requests and perform crawl at maximum load
</td>
</tr><tr valign="top" class="TableCellDark">
<td width="10%">
<input type="radio" name="dcr" value="acceptCrawlLimited" align="top" #(acceptCrawlLimitedChecked)#::checked#(/acceptCrawlLimitedChecked)#>
</td><td>
Accept remote crawling requests and perform crawl at maximum of
<input name="acceptCrawlLimit" type="text" size="4" maxlength="4" value="#[PPM]#"> Pages Per Minute (minimum is 1, low system load at PPM <= 30)
</td>
</tr><tr valign="top" class="TableCellDark">
<td width="10%">
<input type="radio" name="dcr" value="acceptCrawlDenied" align="top" #(acceptCrawlDeniedChecked)#::checked#(/acceptCrawlDeniedChecked)#>
</td><td>
Do not accept remote crawling requests (please set this only if you cannot accept to crawl only one page per minute; see option above)</td>
</td>
</tr><tr valign="top" class="TableCellLight">
<td width="10%"></td><td>
<input type="submit" name="distributedcrawling" value="set"></td> <input type="submit" name="distributedcrawling" value="set"></td>
</tr>
</table> </table>
</form></p> </form></p>
@ -238,9 +255,7 @@ No remote crawl peers availible.<br>
</tr> </tr>
</table> </table>
#(/remoteCrawlPeers)# #(/remoteCrawlPeers)#
</p> <br>
<p>
<form action="IndexCreate_p.html" method="post" enctype="multipart/form-data"> <form action="IndexCreate_p.html" method="post" enctype="multipart/form-data">
#(crawler-paused)# #(crawler-paused)#
<input type="submit" name="continuecrawlqueue" value="continue crawling"> <input type="submit" name="continuecrawlqueue" value="continue crawling">

@ -66,6 +66,7 @@ import de.anomic.plasma.plasmaURL;
import de.anomic.server.serverFileUtils; import de.anomic.server.serverFileUtils;
import de.anomic.server.serverObjects; import de.anomic.server.serverObjects;
import de.anomic.server.serverSwitch; import de.anomic.server.serverSwitch;
import de.anomic.server.serverThread;
import de.anomic.tools.bitfield; import de.anomic.tools.bitfield;
import de.anomic.yacy.yacyCore; import de.anomic.yacy.yacyCore;
import de.anomic.yacy.yacySeed; import de.anomic.yacy.yacySeed;
@ -224,9 +225,27 @@ public class IndexCreate_p {
} }
} }
if (post.containsKey("distributedcrawling")) { if (post.containsKey("distributedcrawling")) {
boolean crawlResponse = ((String) post.get("crawlResponse", "")).equals("on"); long newBusySleep = Integer.parseInt(env.getConfig("62_remotetriggeredcrawl_busysleep", "100"));
env.setConfig("crawlResponse", (crawlResponse) ? "true" : "false"); if (((String) post.get("dcr", "")).equals("acceptCrawlMax")) {
env.setConfig("crawlResponse", "true");
newBusySleep = 100;
} else if (((String) post.get("dcr", "")).equals("acceptCrawlLimited")) {
env.setConfig("crawlResponse", "true");
int newppm = Integer.parseInt(post.get("acceptCrawlLimit", "1"));
if (newppm < 1) newppm = 1;
newBusySleep = 60000 / newppm;
if (newBusySleep < 100) newBusySleep = 100;
} else if (((String) post.get("dcr", "")).equals("acceptCrawlDenied")) {
env.setConfig("crawlResponse", "false");
}
serverThread rct = switchboard.getThread("62_remotetriggeredcrawl");
rct.setBusySleep(newBusySleep);
env.setConfig("62_remotetriggeredcrawl_busysleep", "" + newBusySleep);
//boolean crawlResponse = ((String) post.get("acceptCrawlMax", "")).equals("on");
//env.setConfig("crawlResponse", (crawlResponse) ? "true" : "false");
} }
@ -249,7 +268,25 @@ public class IndexCreate_p {
prop.put("storeHTCacheChecked", env.getConfig("storeHTCache", "").equals("true") ? 1 : 0); prop.put("storeHTCacheChecked", env.getConfig("storeHTCache", "").equals("true") ? 1 : 0);
prop.put("localIndexingChecked", env.getConfig("localIndexing", "").equals("true") ? 1 : 0); prop.put("localIndexingChecked", env.getConfig("localIndexing", "").equals("true") ? 1 : 0);
prop.put("crawlOrderChecked", env.getConfig("crawlOrder", "").equals("true") ? 1 : 0); prop.put("crawlOrderChecked", env.getConfig("crawlOrder", "").equals("true") ? 1 : 0);
prop.put("crawlResponseChecked", env.getConfig("crawlResponse", "").equals("true") ? 1 : 0); long busySleep = Integer.parseInt(env.getConfig("62_remotetriggeredcrawl_busysleep", "100"));
if (env.getConfig("crawlResponse", "").equals("true")) {
if (busySleep <= 100) {
prop.put("acceptCrawlMaxChecked", 1);
prop.put("acceptCrawlLimitedChecked", 0);
prop.put("acceptCrawlDeniedChecked", 0);
} else {
prop.put("acceptCrawlMaxChecked", 0);
prop.put("acceptCrawlLimitedChecked", 1);
prop.put("acceptCrawlDeniedChecked", 0);
}
} else {
prop.put("acceptCrawlMaxChecked", 0);
prop.put("acceptCrawlLimitedChecked", 0);
prop.put("acceptCrawlDeniedChecked", 1);
}
int ppm = (int) ((long) 60000 / busySleep);
if (ppm > 60) ppm = 60;
prop.put("PPM", ppm);
prop.put("xsstopwChecked", env.getConfig("xsstopw", "").equals("true") ? 1 : 0); prop.put("xsstopwChecked", env.getConfig("xsstopw", "").equals("true") ? 1 : 0);
prop.put("xdstopwChecked", env.getConfig("xdstopw", "").equals("true") ? 1 : 0); prop.put("xdstopwChecked", env.getConfig("xdstopw", "").equals("true") ? 1 : 0);
prop.put("xpstopwChecked", env.getConfig("xpstopw", "").equals("true") ? 1 : 0); prop.put("xpstopwChecked", env.getConfig("xpstopw", "").equals("true") ? 1 : 0);

@ -59,13 +59,15 @@ public class Steering {
// handle access rights // handle access rights
switch (switchboard.adminAuthenticated(header)) { switch (switchboard.adminAuthenticated(header)) {
case 0: // wrong password given case 0: // wrong password given
try {Thread.currentThread().sleep(3000);} catch (InterruptedException e) {} try {Thread.currentThread().sleep(3000);} catch (InterruptedException e) {} // prevent brute-force
prop.put("AUTHENTICATE", "admin log-in"); // force log-in
return prop;
case 1: // no password given case 1: // no password given
prop.put("AUTHENTICATE", "admin log-in"); // force log-in prop.put("AUTHENTICATE", "admin log-in"); // force log-in
return prop; return prop;
case 2: // no password stored case 2: // no password stored
prop.put("info", 1); // actions only with password //prop.put("info", 1); // actions only with password
return prop; //return prop;
case 3: // soft-authenticated for localhost only case 3: // soft-authenticated for localhost only
case 4: // hard-authenticated, all ok case 4: // hard-authenticated, all ok
} }

@ -7,7 +7,7 @@ under certain conditions; see file gpl.txt for details.
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
This is a P2P-based Web Search Engine This is a P2P-based Web Search Engine
and also a http/https proxy. and also a caching http/https proxy.
The complete documentation can be found inside the 'doc' subdirectory The complete documentation can be found inside the 'doc' subdirectory
in this release. Start browsing the manual by opening the index.html in this release. Start browsing the manual by opening the index.html
@ -16,22 +16,34 @@ file with your web browser.
YOU NEED JAVA 1.4.2 OR LATER TO RUN THIS APPLICATION! YOU NEED JAVA 1.4.2 OR LATER TO RUN THIS APPLICATION!
PLEASE DOWNLOAD JAVA FROM http://java.sun.com PLEASE DOWNLOAD JAVA FROM http://java.sun.com
Startup of YaCy: Startup and Shutdown of YaCy:
- on Linux : start startYACY.sh - on Linux:
- on Windows : double-click startYACY.bat to start: execute startYACY.sh
- on Mac OS X : double-click startYACY.command (alias possible!) to stop : execute stopYACY.sh
- on any other OS : set your classpath to the 'classes' folder
and execute yacy.class, while your current system
path must target the release directory to access the
configuration files.
Then start using YaCy with the applications on-line interface: - on Windows:
to start: double-click startYACY.bat
to stop : double-click stopYACY.bat
- on Mac OS X:
to start: double-click startYACY.command (alias possible!)
to stop : double-click stopYACY.command
- on any other OS:
to start: execute java as
java -classpath classes:htroot:lib/commons-collections.jar:lib/commons-pool-1.2.jar yacy -startup <yacy-release-path>
to stop : execute java as
java -classpath classes:htroot:lib/commons-collections.jar:lib/commons-pool-1.2.jar yacy -shutdown
YaCy is a server process that can be administrated and used
with your web browser:
browse to http://localhost:8080 where you can see your personal browse to http://localhost:8080 where you can see your personal
search, configuration and administration interface. search, configuration and administration interface.
If you want to use the proxy, simply configure your internet connection If you want to use the built-in proxy, simply configure your internet connection
to use YaCy at port 8080. You can also change the default proxy port. to use a proxy at port 8080. You can also change this default proxy port.
If you like to use YaCy not as proxy but only as distributed If you like to use YaCy not as proxy but only as distributed
crawling/search engine, you can do so. crawling/search engine, you can do so.
@ -47,5 +59,5 @@ feel free to ask the author for a business proposal to customize YaCy
according to your needs. We also provide integration solutions if the according to your needs. We also provide integration solutions if the
software is about to be integrated into your enterprise application. software is about to be integrated into your enterprise application.
Germany, Frankfurt a.M., 03.05.2005 Germany, Frankfurt a.M., 22.07.2005
Michael Peter Christen Michael Peter Christen

@ -136,7 +136,10 @@ public final class plasmaHTCache {
} }
public Entry pop() { public Entry pop() {
return (Entry) cacheStack.removeFirst(); if (cacheStack.size() > 0)
return (Entry) cacheStack.removeFirst();
else
return null;
} }
public void storeHeader(String urlHash, httpHeader responseHeader) throws IOException { public void storeHeader(String urlHash, httpHeader responseHeader) throws IOException {
@ -243,7 +246,7 @@ public final class plasmaHTCache {
ageHours = (System.currentTimeMillis() - ageHours = (System.currentTimeMillis() -
Long.parseLong(((String) cacheAge.firstKey()).substring(0, 16), 16)) / 3600000; Long.parseLong(((String) cacheAge.firstKey()).substring(0, 16), 16)) / 3600000;
} catch (NumberFormatException e) { } catch (NumberFormatException e) {
e.printStackTrace(); //e.printStackTrace();
} }
log.logSystem("CACHE SCANNED, CONTAINS " + c + log.logSystem("CACHE SCANNED, CONTAINS " + c +
" FILES = " + currCacheSize/1048576 + "MB, OLDEST IS " + " FILES = " + currCacheSize/1048576 + "MB, OLDEST IS " +

@ -400,7 +400,7 @@ xpstopw=true
20_dhtdistribution_memprereq=1000000 20_dhtdistribution_memprereq=1000000
30_peerping_idlesleep=120000 30_peerping_idlesleep=120000
30_peerping_busysleep=120000 30_peerping_busysleep=120000
30_peerping_memprereq=20000 30_peerping_memprereq=100000
40_peerseedcycle_idlesleep=1800000 40_peerseedcycle_idlesleep=1800000
40_peerseedcycle_busysleep=1200000 40_peerseedcycle_busysleep=1200000
40_peerseedcycle_memprereq=1000000 40_peerseedcycle_memprereq=1000000
@ -411,14 +411,14 @@ xpstopw=true
61_globalcrawltrigger_busysleep=100 61_globalcrawltrigger_busysleep=100
61_globalcrawltrigger_memprereq=1000000 61_globalcrawltrigger_memprereq=1000000
62_remotetriggeredcrawl_idlesleep=10000 62_remotetriggeredcrawl_idlesleep=10000
62_remotetriggeredcrawl_busysleep=100 62_remotetriggeredcrawl_busysleep=2000
62_remotetriggeredcrawl_memprereq=1000000 62_remotetriggeredcrawl_memprereq=1000000
70_cachemanager_idlesleep=5000 70_cachemanager_idlesleep=5000
70_cachemanager_busysleep=0 70_cachemanager_busysleep=0
70_cachemanager_memprereq=10000 70_cachemanager_memprereq=100000
80_indexing_idlesleep=5000 80_indexing_idlesleep=5000
80_indexing_busysleep=0 80_indexing_busysleep=0
80_indexing_memprereq=2000000 80_indexing_memprereq=1000000
90_cleanup_idlesleep=300000 90_cleanup_idlesleep=300000
90_cleanup_busysleep=300000 90_cleanup_busysleep=300000
90_cleanup_memprereq=0 90_cleanup_memprereq=0
@ -461,7 +461,7 @@ ramCacheWiki = 8192
# flushed to disc; this may last some minutes. # flushed to disc; this may last some minutes.
# maxWaitingWordFlush gives the number of seconds that the shutdown # maxWaitingWordFlush gives the number of seconds that the shutdown
# may last for the word flush # may last for the word flush
wordCacheMax = 6000 wordCacheMax = 10000
maxWaitingWordFlush = 180 maxWaitingWordFlush = 180
# Specifies if yacy can be used as transparent http proxy. # Specifies if yacy can be used as transparent http proxy.

@ -12,7 +12,7 @@
# INFO regular action information (i.e. any httpd request URL) # INFO regular action information (i.e. any httpd request URL)
# FINEST in-function status debug output # FINEST in-function status debug output
PARSER.level = INFO PARSER.level = INFO
YACY.level = FINEST YACY.level = INFO
HTCACHE.level = INFO HTCACHE.level = INFO
PLASMA.level = FINEST PLASMA.level = FINEST
SERVER.level = INFO SERVER.level = INFO

@ -3,6 +3,3 @@
# then the proxy passes the client's user agent to the domain's server # then the proxy passes the client's user agent to the domain's server
google google
yahoo yahoo
heise
ebay
stern
Loading…
Cancel
Save