- added a open-on-demand hack to heap files: when a heap file is
opened the first time, it is first scanned to get a key index
and then it is closed again. This will free up file pointers
in cases where a really large number of blob files are opened
upon initialization of ArrayStack objects. This should solve
also a problem reported in
http://forum.yacy-websuche.de/viewtopic.php?p=17191#p17191
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6267 6c8d7289-2bf4-0310-a012-ef5d649a1542
this is the beginning of some architecture changes that will hopefully bring some more stability, speed and transparency to the search process.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6260 6c8d7289-2bf4-0310-a012-ef5d649a1542
- check routine at the beginning of the import of table keys that check that all imported keys are well-formed. All records that have a ill-formed key are deleted. This is a hack and is not tested since I don't have bad data here to test with. If the effect is seen in the wild, please report in the forum.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6245 6c8d7289-2bf4-0310-a012-ef5d649a1542
- speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer
- fixed some deadlock- and 100% CPU problems in the balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6243 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added special rule to balancer to omit forced delays if cache is used exclusively
- extended the htCache size by default to 32GB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better control to the cache by using combined request-header and content access methods
- refactoring of many classes to comply to this new access method
- make shure that the cache is always written if something was loaded
- some redesign of the process how http response results are feeded into the new indexing queue
- introduction of a cache read policy:
* never use the cache
* use the cache if entry exist
* use the cache if the proxy freshness rule confirmes
* use only the cache and go never online
- added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step.
- set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule'
- enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542