- added dummy servlet class, because otherwise the template engine is not triggered.
thats so because the yacy httpd works much faster as normal file server without a scan
of the served pages. Therefore each page with templates must now have a class file associated to it.
- fixed json output format of yacysearch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5449 6c8d7289-2bf4-0310-a012-ef5d649a1542
The cache is now flushed only for one second every ten seconds. During a crawl the cache
fills up completely, and is only flushed if space is needed for more documents.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5446 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) added some Javascript to disable list selection on bottom of list in cases it is not needed (edit, delete) and only enable it if needed (move), if JS is turned off everything will work as usual
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5435 6c8d7289-2bf4-0310-a012-ef5d649a1542
- introduced blocking queues in CrawlStacker to make it ready for concurrency
- added a second busy thread for the CrawlStacker
The CrawlStacker is multithreaded. It shall be transformed into a BlockingThread in another step.
The concurrency of the stacker will hopefully solve some problems with cases where DNS blocks.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5395 6c8d7289-2bf4-0310-a012-ef5d649a1542
(points now more directly to performance leaks without mentioning class calls inside of sun/java calls that cannot be changed anyway)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5390 6c8d7289-2bf4-0310-a012-ef5d649a1542
- modified and activated write buffer
- increased cache flush factor
- fixed a problem with deadlocking of indexing process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542
- during the user types search queries, the local database is searched
- results are presented interactively
This was implemented using a new JSON result format for search results in YaCy
- added JSON as file format for servlets
- refactoring of current search servlets (xml and html)
- added JSON output format for search results
- added AJAX-based search page, that uses the yacysearch.json selrvlet to print results as a query is typed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5373 6c8d7289-2bf4-0310-a012-ef5d649a1542
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.
- enhanced also the md5 computation speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed never-used secondary crawl depth
- added a must-not-match filter that can be used to exclude urls from a crawl
- added stub for crawl tags which will be used to identify search results that had been produced from specific crawls
please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'.
Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) it is possible to move/edit/delete more than one entry at a time now
*) it is easier to choose a target for blacklist import now
*) fixed several bugs
*) to be continued...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5330 6c8d7289-2bf4-0310-a012-ef5d649a1542
- for redirector and remote crawling place crawling url on notice queue instead of direct enqueueing in crawler queue
- when a request to a remote crawl provider fails, remove the peer from the network to prevent that the url fetcher gets stuck another time again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5320 6c8d7289-2bf4-0310-a012-ef5d649a1542
- implemented vertical DHT acceptance ("my own DHT") to accept new targets
- added new target computation for global search: addresses vertical targets also
- enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries)
- better performance value computations for PPM selection in network configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542
- two different computations (but mathematical equivalent) of the DHT distance had been consolidated
- moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets
- added fast Long - to - hash computation
- high-precision target computation of gaps for new peers
- added new target computation for horizontal and vertical DHT targets (not yet in use)
- old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) fixed display of values of information for which part of YaCy (crawler, proxy, ...) blacklist is activated for
*) replaced regular put() with putXML() in several cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5305 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this)
*) further minor fixes
*) to be continued...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5301 6c8d7289-2bf4-0310-a012-ef5d649a1542
The old process used a not really efficient way to detect html encoding strings in texts.
All calling methods had been adoped to call the new class in an enhanced way with less parameters.
Many classes in interfaces used a XML encoding only (instead of full html conversion from unicode to html); this behavior was not changed with this commit but should be controlled again since it points out possible XSS leaks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5295 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed the now superfluous HT storage thread
- reduced number of file decompression by shifting the compression moment to the future
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542
- files are not stored any more as individual files
- a new database structure using BLOBHeap files stores many cache entries in common files
- all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods
this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fixed a display bug for the performance graph
- fixed deadlock when initialization of awt happens simultanously
- removed some debugging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added 'Open' button to search result toolbar
- lets you open all selected search results in new window/tab
- added 'any language' filter as default
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5240 6c8d7289-2bf4-0310-a012-ef5d649a1542
- balancer writes cause of robots.txt in log file for crawl delay
- removed log output for forced GC
- smaller RAM flush for RWI cache, should cause more usage of cache and faster crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5228 6c8d7289-2bf4-0310-a012-ef5d649a1542
- smaller kaskelix image to make room for more information
- added the memory graph, since this picture is widely used to monitor YaCys activities
- added border to log line iframe (looks better together with memory graph)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5224 6c8d7289-2bf4-0310-a012-ef5d649a1542
- the language can be selected using a LANGUAGE:<language> element in the query line, i.e.:
java LANGUAGE:en
- the language can be selected with a post element in google-style syntax with the 'rl' element:
?lr=lang_en&query=java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5193 6c8d7289-2bf4-0310-a012-ef5d649a1542
- metadata and results from statistical analysis are compared and result is printed out as debug lines
- added ranking profile for wanted language
- added class with ISO 639 table, a list of all valid country codes that will be used for the language identification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5187 6c8d7289-2bf4-0310-a012-ef5d649a1542
- reduced maximum number of error url references to save some memory (this was actually a small memory leak)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5174 6c8d7289-2bf4-0310-a012-ef5d649a1542
http://forum.yacy-websuche.de/viewtopic.php?p=8130#p8130
- removed the online status property. This influenced the proxy behavior and created some complexity that was not needed because the online status was never used as it was ceated for (offline browsing)
- checked all proxy identification procedures during crawling and enhanced transparency and error checking
- fixed a proxy identification routine that caused the wrong selection of the proxy result queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5173 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added more search engines
- some refactoring and added a list that is used to present the search engine list in a specific order
- added simpleheader and no-header options
- added the compare search to the simple header
- added default compare search page selection storage - after re-start you get the same default search engines as you selected before
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5157 6c8d7289-2bf4-0310-a012-ef5d649a1542