performance setting for remote indexing configuration and latest changes for 0.39

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@424 6c8d7289-2bf4-0310-a012-ef5d649a1542
pull/1/head
orbiter 20 years ago
parent 6c13cf5b0c
commit be1f324fca

@ -3,11 +3,11 @@ javacSource=1.4
javacTarget=1.4
# Release Configuration
releaseVersion=0.387
releaseFile=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz
#releaseFile=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz
releaseDir=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}
#releaseDir=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}
releaseVersion=0.39
#releaseFile=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz
releaseFile=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}.tar.gz
#releaseDir=yacy_dev_v${releaseVersion}_${DSTAMP}_${releaseNr}
releaseDir=yacy_v${releaseVersion}_${DSTAMP}_${releaseNr}
releaseNr=$Revision$
# defining some file/directory access rights

@ -322,8 +322,6 @@
<include name="yacy.logging"/>
<include name="yacy.init"/>
<include name="yacy.yellow"/>
<include name="yacy.black"/>
<include name="yacy.blue"/>
<include name="yacy.stopwords"/>
<include name="yacy.parser"/>
<include name="httpd.mime"/>

@ -43,6 +43,42 @@ globalheader();
</ul>
-->
<br><p>v0.39_20050722_424
<ul>
<li>New Features:</li>
<ul>
<li>Added snippets to search results. Snippets are fetched by searching peer from original web sites and are also transported during result transmission from remote search results.</li>
<li>Proxy shows now an error page in case of errors.</li>
<li>Preparation for localization: started (not finished) German translation</li>
<li>Status page shows now memory amount, transfer volume and indexing speed as PPM (pages per minute). A global PPM (sum over all peers) is also computed.</li>
<li>Re-Structuring of Index-Creation Menue: added more submenues and queue monitors</li>
<li>Added feature to start crawling on bookmark files</li>
<li>Added blocking of blacklistet urls in indexReceive (remote DHT index transmissions)</li>
<li>Added port forwarding for remote peer connections (the peer may now be connected to an configurable address)</li>
<li>Added bbCode for Profiles</li>
<li>Memory Management in Performance Menu: a memory-limit can be set as condition for queue execution.</li>
<li>Added option to do performance-limited remote crawls (use this instead to switch off remote indexing if you are scared about too much performance loss on your machine)</li>
<li>Enhanced logging, configuration with yacy.logging</li>
</ul>
<li>Performance: enhanced indexing speed</li>
<ul>
<li>Implemented indexing/loading multithreading</li>
<li>Enhanced caching in database (less memory occupation)</li>
<li>Replaced RAM-queue after indexing by a file-based queue (makes long queues possible)</li>
<li>Changed assortment cache-flush procedure: words may now appear in any assortment, not only one assortment. This prevents assortment-flushes, increases the capacity and prevents creation of files in DATA/PLASMADB/WORDS, which further speeds up indexing.</li>
<li>Speed-up of start-up and shut-down by replacement of stack by array. The dumped index takes also less memory on disk now. Because dumping is faster, the cache may be bigger which also increases indexing speed.</li>
</ul>
<li>Bugfixes:</li>
<ul>
<li>Better shut-down behavior, time-out on sockets, less exceptions</li>
<li>Fixed gzip decoding and content-length in http-client</li>
<li>Better httpd header validation</li>
<li>Fixed possible memory leaks</li>
<li>Fixed 100% CPU bug (caused by repeated GC when memory was low)</li>
<li>Fixed UTF8-decoding for parser</li>
</ul>
</ul>
<br><p>v0.38_20050603_208
<ul>
<li>Enhanced Crawling:

@ -28,6 +28,7 @@ the P2P-based index distribution was designed and implemented by <b>Michael Pete
<li><b>Alexander Schier</b> did much alpha-testing, gave valuable feed-back on my ideas and suggested his own. He suggested and implemented large parts of the popular blacklist feature. He supplied the 'Log'-menu function, the skin-feature, many minor changes, bug fixes and the Windows-Installer - version of YaCy. Alex also provides and maintaines the <a href="http://www.suma-lab.de/yacy/">german documentation</a> for yacy.</li>
<li><b>Martin Thelian</b> made system-wide performance enhancement by introducing thread pools. He provided a plug-in system for external text parser and integrated many parser libraries such as pdf and word format parsers. Martin also extended and enhanced the http and proxy protocol towards a rfc-clean implementation.</li>
<li><b>Roland Ramthun</b> owns and administrates the <a href="http://www.yacy-forum.de/">German YaCy-Forum</a>. He also cares for correct English spelling and a German translation of the YaCy user interface. Roland and other forum participants extended the PHPForum code to make it possible to track development feature requests and bug reports with status codes and editor flags.</li>
<li><b>Marc Nause</b> made enhancements to the Message- and User-Profile menues and functions.</li>
<li><b>Natali Christen</b> designed the YaCy logo.</li>
<li><b>Thomas Quella</b> designed the Kaskelix mascot.</li>
<li><b>Wolfgang Sander-Beuermann</b>, executive board member of the German search-engine association <a href="http://www.suma-ev.de/">SuMa-eV</a>

@ -133,11 +133,28 @@ Crawling and indexing can be done by remote peers.
Your peer can search and index for other peers and they can search for you.</div>
<table border="0" cellpadding="5" cellspacing="0" width="100%">
<tr valign="top" class="TableCellDark">
<td width="30%">
<input type="checkbox" name="crawlResponse" align="top" #(crawlResponseChecked)#::checked#(/crawlResponseChecked)#>
Accept remote crawling requests</td>
<td>
<td width="10%">
<input type="radio" name="dcr" value="acceptCrawlMax" align="top" #(acceptCrawlMaxChecked)#::checked#(/acceptCrawlMaxChecked)#>
</td><td>
Accept remote crawling requests and perform crawl at maximum load
</td>
</tr><tr valign="top" class="TableCellDark">
<td width="10%">
<input type="radio" name="dcr" value="acceptCrawlLimited" align="top" #(acceptCrawlLimitedChecked)#::checked#(/acceptCrawlLimitedChecked)#>
</td><td>
Accept remote crawling requests and perform crawl at maximum of
<input name="acceptCrawlLimit" type="text" size="4" maxlength="4" value="#[PPM]#"> Pages Per Minute (minimum is 1, low system load at PPM <= 30)
</td>
</tr><tr valign="top" class="TableCellDark">
<td width="10%">
<input type="radio" name="dcr" value="acceptCrawlDenied" align="top" #(acceptCrawlDeniedChecked)#::checked#(/acceptCrawlDeniedChecked)#>
</td><td>
Do not accept remote crawling requests (please set this only if you cannot accept to crawl only one page per minute; see option above)</td>
</td>
</tr><tr valign="top" class="TableCellLight">
<td width="10%"></td><td>
<input type="submit" name="distributedcrawling" value="set"></td>
</tr>
</table>
</form></p>
@ -238,9 +255,7 @@ No remote crawl peers availible.<br>
</tr>
</table>
#(/remoteCrawlPeers)#
</p>
<p>
<br>
<form action="IndexCreate_p.html" method="post" enctype="multipart/form-data">
#(crawler-paused)#
<input type="submit" name="continuecrawlqueue" value="continue crawling">

@ -66,6 +66,7 @@ import de.anomic.plasma.plasmaURL;
import de.anomic.server.serverFileUtils;
import de.anomic.server.serverObjects;
import de.anomic.server.serverSwitch;
import de.anomic.server.serverThread;
import de.anomic.tools.bitfield;
import de.anomic.yacy.yacyCore;
import de.anomic.yacy.yacySeed;
@ -224,9 +225,27 @@ public class IndexCreate_p {
}
}
if (post.containsKey("distributedcrawling")) {
boolean crawlResponse = ((String) post.get("crawlResponse", "")).equals("on");
env.setConfig("crawlResponse", (crawlResponse) ? "true" : "false");
long newBusySleep = Integer.parseInt(env.getConfig("62_remotetriggeredcrawl_busysleep", "100"));
if (((String) post.get("dcr", "")).equals("acceptCrawlMax")) {
env.setConfig("crawlResponse", "true");
newBusySleep = 100;
} else if (((String) post.get("dcr", "")).equals("acceptCrawlLimited")) {
env.setConfig("crawlResponse", "true");
int newppm = Integer.parseInt(post.get("acceptCrawlLimit", "1"));
if (newppm < 1) newppm = 1;
newBusySleep = 60000 / newppm;
if (newBusySleep < 100) newBusySleep = 100;
} else if (((String) post.get("dcr", "")).equals("acceptCrawlDenied")) {
env.setConfig("crawlResponse", "false");
}
serverThread rct = switchboard.getThread("62_remotetriggeredcrawl");
rct.setBusySleep(newBusySleep);
env.setConfig("62_remotetriggeredcrawl_busysleep", "" + newBusySleep);
//boolean crawlResponse = ((String) post.get("acceptCrawlMax", "")).equals("on");
//env.setConfig("crawlResponse", (crawlResponse) ? "true" : "false");
}
@ -249,7 +268,25 @@ public class IndexCreate_p {
prop.put("storeHTCacheChecked", env.getConfig("storeHTCache", "").equals("true") ? 1 : 0);
prop.put("localIndexingChecked", env.getConfig("localIndexing", "").equals("true") ? 1 : 0);
prop.put("crawlOrderChecked", env.getConfig("crawlOrder", "").equals("true") ? 1 : 0);
prop.put("crawlResponseChecked", env.getConfig("crawlResponse", "").equals("true") ? 1 : 0);
long busySleep = Integer.parseInt(env.getConfig("62_remotetriggeredcrawl_busysleep", "100"));
if (env.getConfig("crawlResponse", "").equals("true")) {
if (busySleep <= 100) {
prop.put("acceptCrawlMaxChecked", 1);
prop.put("acceptCrawlLimitedChecked", 0);
prop.put("acceptCrawlDeniedChecked", 0);
} else {
prop.put("acceptCrawlMaxChecked", 0);
prop.put("acceptCrawlLimitedChecked", 1);
prop.put("acceptCrawlDeniedChecked", 0);
}
} else {
prop.put("acceptCrawlMaxChecked", 0);
prop.put("acceptCrawlLimitedChecked", 0);
prop.put("acceptCrawlDeniedChecked", 1);
}
int ppm = (int) ((long) 60000 / busySleep);
if (ppm > 60) ppm = 60;
prop.put("PPM", ppm);
prop.put("xsstopwChecked", env.getConfig("xsstopw", "").equals("true") ? 1 : 0);
prop.put("xdstopwChecked", env.getConfig("xdstopw", "").equals("true") ? 1 : 0);
prop.put("xpstopwChecked", env.getConfig("xpstopw", "").equals("true") ? 1 : 0);

@ -59,13 +59,15 @@ public class Steering {
// handle access rights
switch (switchboard.adminAuthenticated(header)) {
case 0: // wrong password given
try {Thread.currentThread().sleep(3000);} catch (InterruptedException e) {}
try {Thread.currentThread().sleep(3000);} catch (InterruptedException e) {} // prevent brute-force
prop.put("AUTHENTICATE", "admin log-in"); // force log-in
return prop;
case 1: // no password given
prop.put("AUTHENTICATE", "admin log-in"); // force log-in
return prop;
case 2: // no password stored
prop.put("info", 1); // actions only with password
return prop;
//prop.put("info", 1); // actions only with password
//return prop;
case 3: // soft-authenticated for localhost only
case 4: // hard-authenticated, all ok
}

@ -7,7 +7,7 @@ under certain conditions; see file gpl.txt for details.
---------------------------------------------------------------------------
This is a P2P-based Web Search Engine
and also a http/https proxy.
and also a caching http/https proxy.
The complete documentation can be found inside the 'doc' subdirectory
in this release. Start browsing the manual by opening the index.html
@ -16,22 +16,34 @@ file with your web browser.
YOU NEED JAVA 1.4.2 OR LATER TO RUN THIS APPLICATION!
PLEASE DOWNLOAD JAVA FROM http://java.sun.com
Startup of YaCy:
Startup and Shutdown of YaCy:
- on Linux : start startYACY.sh
- on Windows : double-click startYACY.bat
- on Mac OS X : double-click startYACY.command (alias possible!)
- on any other OS : set your classpath to the 'classes' folder
and execute yacy.class, while your current system
path must target the release directory to access the
configuration files.
- on Linux:
to start: execute startYACY.sh
to stop : execute stopYACY.sh
Then start using YaCy with the applications on-line interface:
- on Windows:
to start: double-click startYACY.bat
to stop : double-click stopYACY.bat
- on Mac OS X:
to start: double-click startYACY.command (alias possible!)
to stop : double-click stopYACY.command
- on any other OS:
to start: execute java as
java -classpath classes:htroot:lib/commons-collections.jar:lib/commons-pool-1.2.jar yacy -startup <yacy-release-path>
to stop : execute java as
java -classpath classes:htroot:lib/commons-collections.jar:lib/commons-pool-1.2.jar yacy -shutdown
YaCy is a server process that can be administrated and used
with your web browser:
browse to http://localhost:8080 where you can see your personal
search, configuration and administration interface.
If you want to use the proxy, simply configure your internet connection
to use YaCy at port 8080. You can also change the default proxy port.
If you want to use the built-in proxy, simply configure your internet connection
to use a proxy at port 8080. You can also change this default proxy port.
If you like to use YaCy not as proxy but only as distributed
crawling/search engine, you can do so.
@ -47,5 +59,5 @@ feel free to ask the author for a business proposal to customize YaCy
according to your needs. We also provide integration solutions if the
software is about to be integrated into your enterprise application.
Germany, Frankfurt a.M., 03.05.2005
Germany, Frankfurt a.M., 22.07.2005
Michael Peter Christen

@ -136,7 +136,10 @@ public final class plasmaHTCache {
}
public Entry pop() {
return (Entry) cacheStack.removeFirst();
if (cacheStack.size() > 0)
return (Entry) cacheStack.removeFirst();
else
return null;
}
public void storeHeader(String urlHash, httpHeader responseHeader) throws IOException {
@ -243,7 +246,7 @@ public final class plasmaHTCache {
ageHours = (System.currentTimeMillis() -
Long.parseLong(((String) cacheAge.firstKey()).substring(0, 16), 16)) / 3600000;
} catch (NumberFormatException e) {
e.printStackTrace();
//e.printStackTrace();
}
log.logSystem("CACHE SCANNED, CONTAINS " + c +
" FILES = " + currCacheSize/1048576 + "MB, OLDEST IS " +

@ -400,7 +400,7 @@ xpstopw=true
20_dhtdistribution_memprereq=1000000
30_peerping_idlesleep=120000
30_peerping_busysleep=120000
30_peerping_memprereq=20000
30_peerping_memprereq=100000
40_peerseedcycle_idlesleep=1800000
40_peerseedcycle_busysleep=1200000
40_peerseedcycle_memprereq=1000000
@ -411,14 +411,14 @@ xpstopw=true
61_globalcrawltrigger_busysleep=100
61_globalcrawltrigger_memprereq=1000000
62_remotetriggeredcrawl_idlesleep=10000
62_remotetriggeredcrawl_busysleep=100
62_remotetriggeredcrawl_busysleep=2000
62_remotetriggeredcrawl_memprereq=1000000
70_cachemanager_idlesleep=5000
70_cachemanager_busysleep=0
70_cachemanager_memprereq=10000
70_cachemanager_memprereq=100000
80_indexing_idlesleep=5000
80_indexing_busysleep=0
80_indexing_memprereq=2000000
80_indexing_memprereq=1000000
90_cleanup_idlesleep=300000
90_cleanup_busysleep=300000
90_cleanup_memprereq=0
@ -461,7 +461,7 @@ ramCacheWiki = 8192
# flushed to disc; this may last some minutes.
# maxWaitingWordFlush gives the number of seconds that the shutdown
# may last for the word flush
wordCacheMax = 6000
wordCacheMax = 10000
maxWaitingWordFlush = 180
# Specifies if yacy can be used as transparent http proxy.

@ -12,7 +12,7 @@
# INFO regular action information (i.e. any httpd request URL)
# FINEST in-function status debug output
PARSER.level = INFO
YACY.level = FINEST
YACY.level = INFO
HTCACHE.level = INFO
PLASMA.level = FINEST
SERVER.level = INFO

@ -3,6 +3,3 @@
# then the proxy passes the client's user agent to the domain's server
google
yahoo
heise
ebay
stern
Loading…
Cancel
Save