- added initial space option for eco tables
- used initial space value in initialization of collectionIndex, this should avoid OOM failures" /Volumes/Magneto/dev/workspace/trunk/source/dbtest.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroCollectionIndex.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroDyn.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroEcoTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroRow.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroSplitTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlBalancer.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlStacker.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlZURL.java
- added index consistency check (checks for double-occurrences of primary keys in file)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4349 6c8d7289-2bf4-0310-a012-ef5d649a1542
This data structure replaces almost all files in the PLASMA directory
also the collection.index and the LURL-db will be created as Eco-DB, if it does not exist before
existing Flex-databases will be used as they are (the is no data lost)
If you want to force the creation of a Eco-collection.index, simply delete the old index.
The Eco file system will only be used if there is enough memory.
The collection.index RAM limit is 200MB, if you have less, a flex-Table is createt.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4340 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added new data structure 'eco' for an index file that should use only 50% of write-IO compared to kelondroFlex
The new eco index is not used yet, but already successfully tested with the collectionIndex
The main purpose is to replace the kelondroFlex at every point when enough RAM is available.
Othervise, the kelondroFlex stays as option in case of low memory (which then can even use a file-index)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4337 6c8d7289-2bf4-0310-a012-ef5d649a1542
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed yacy logo
- removed crawlOrder protocol (unused)
- removed file index in kelondroFlex (will not work, it takes too long to maintain)
- fixed remoted crawl for clusters (now denies remote crawls from peers outside cluster)
- 0.562
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4317 6c8d7289-2bf4-0310-a012-ef5d649a1542
- more generics
- preparation to extend the balancer for flexible forced delay times
- set different random-access type, should now omit update of metadata in file and could be a bit faster (lets see)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4309 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)
please set your development environment to use java 1.5!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
- remove unnecessary generation of Calendar and Date objects
- synchronized SimpleDateFormat objects in blog-, message- and wikiBoard
- correct use of TimeZones and SimpleDateFormats
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4288 6c8d7289-2bf4-0310-a012-ef5d649a1542
a new release type was added: 'embedded' which is the same as the current standard release was
this will not have any effect to the next release 0.56, which will still a pro-release on public download
the transition the the new release strategy must be done now to enable automatic update by the updated in future releases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4287 6c8d7289-2bf4-0310-a012-ef5d649a1542
- enhanced searchtime in kelondroRowSets
- enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4207 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better handling of small collections (less overhead)
- usage of pre-sorted limits
- different re-sort limit
- more testing procedures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4200 6c8d7289-2bf4-0310-a012-ef5d649a1542
the enhancement was made by using better organized data structures and
multi-threading during the sort. A sort can be divided into two separate
processes when the first partition of the quicksort algorithm was done.
Generating a separate thread and starting the thread takes only 10 milliseconds,
so using a separate thread makes only sense if the data amount is large.
statistics about the speed-up:
without ehancement: 250 milliseconds for 100000 entries
with data structure enhancement: 170 milliseconds for 100000 entries
with additional second thread (if second processor is present): 130 milliseconds.
For dual-processor systems, this means about 100% speed-up
a test can be made with the following command:
java -classpath classes de.anomic.kelondro.kelondroRowCollection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4198 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Array element shifting during remove is only done when it is necessary to keep the order of a row collection.
- This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4158 6c8d7289-2bf4-0310-a012-ef5d649a1542
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
- snippets are not fetched by browser using ajax, they are now fetched internally
- YaCy-internat threads control existence of snippets and sort out bad results
- search results are prepared using SSI includes
- the search result page is visible right after the search request, the results drop in when they are detected
- no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers
- added result page switching! after the first 10 results, the next page can be retrieved
- number of remote results is updated online on the result page as they drop in
- removed old snippet servelet (which had been also a security leak btw)
- media search is broken now, will be redesigned and fixed in another step
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542