This bug affected Tables in case that a removeOne() was called and a RAM copy of the table was active. It may happen for peer owners with a lot of RAM assigned to YaCy. The bug appeared especially during crawling when the balancer tried to get new entries from the crawl queue.
This bug may help to solve report at
http://forum.yacy-websuche.de/viewtopic.php?p=17417#p17417
and will be tracked there
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6322 6c8d7289-2bf4-0310-a012-ef5d649a1542
to make this possible, the yacyURL must be able to process file:// urls, which has also been implemented
testing of the new class resulted in some bugfixes in other classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6313 6c8d7289-2bf4-0310-a012-ef5d649a1542
- limit comparisment to only the first 10 elements that had been sorted before without IO
- added a size cache to index computation because the size is computed at least twice in set comparator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6306 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Start the iteration at startWordHash
- When used with rotation, let the iteration stop when the cache is empty
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6293 6c8d7289-2bf4-0310-a012-ef5d649a1542
- some redesign of DidYouMean that was necessary to follow
a special rule how a library should be used:
- the library provides words that start or end with a test
word which may be possibly also an empty set of words
- all words that the DidYouMean produced with the four
production rules are used to generate a set of
library-completed words
- if this process results in any words from the library,
only library-genrated words are taken
- if the is no library-generated word at all, take the
artifial generated word
- all words that result from these rules are tested against
the index
- the result is ordered using a lightweight comparator that
prefers short words
- a not-so-much-io test against the index is beeing prepared
next
- insered the library initialization into the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6284 6c8d7289-2bf4-0310-a012-ef5d649a1542
this will cause better computation speed for single- and multi-core;
there are enhancements that will speed up old and slow machines as well
as multi-core CPUs. Indexing of surrogates has been speed up
from 4000 PPM to over 20000 PPM on a simple dual core office computer.
Since the enhancements are mostly in core routines, the hack should also
speed up search performance.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6276 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added a open-on-demand hack to heap files: when a heap file is
opened the first time, it is first scanned to get a key index
and then it is closed again. This will free up file pointers
in cases where a really large number of blob files are opened
upon initialization of ArrayStack objects. This should solve
also a problem reported in
http://forum.yacy-websuche.de/viewtopic.php?p=17191#p17191
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6267 6c8d7289-2bf4-0310-a012-ef5d649a1542
- check routine at the beginning of the import of table keys that check that all imported keys are well-formed. All records that have a ill-formed key are deleted. This is a hack and is not tested since I don't have bad data here to test with. If the effect is seen in the wild, please report in the forum.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6245 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better control to the cache by using combined request-header and content access methods
- refactoring of many classes to comply to this new access method
- make shure that the cache is always written if something was loaded
- some redesign of the process how http response results are feeded into the new indexing queue
- introduction of a cache read policy:
* never use the cache
* use the cache if entry exist
* use the cache if the proxy freshness rule confirmes
* use only the cache and go never online
- added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step.
- set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule'
- enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542