this stores now two index structures, one for data that is aquired during start-up
and one for data that is aquired during run-time. This reduces the grow factor, and should reduce the memory amount in case that a index-reorganisation happens.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3733 6c8d7289-2bf4-0310-a012-ef5d649a1542
- the network configuration page shows a new option: robinson clusters
- when a global search is made, all robinson peers are excluded, but:
- robinson peers/clusters that provide peer tags and where search words match
such tags, they are included in global search. Therefore, robinson peers/clusters
support the global yacy network with their indexes, without doin DHT-exchange
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3598 6c8d7289-2bf4-0310-a012-ef5d649a1542
http://www.yacy-forum.de/viewtopic.php?t=3854
This is a serious problem that is caused by the database bug between 0.511 - 0.513
which produced a large number of double-entries in the RWI index. The uniq()-method
tries to fix this, and it does not terminate when the index is large and the number
of double-occurrences is also large. This patch does simply implement a time-controlled
termination, which does not heal the inconsistency problem. The uniq-method itself
is correct and does not need a bugfix, the non-termination is simply caused by the large number
of data that is shifted during the process. It was possible to reproduce this behaviour
in a test environment.
A real fix would need to:
- enhance the uniq()-method by using a recursive, binary segmentation of the array to be fixed
- uniq() must report the entries that are double
- the double-entries must be deleted from the collection index (from the index and the collections) to heal the problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3583 6c8d7289-2bf4-0310-a012-ef5d649a1542
- re-implemented index load/extend optimization that was removed from kelondroFlexTable,
this is now part of kelondroIntBytesIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3580 6c8d7289-2bf4-0310-a012-ef5d649a1542
- some bugs may have been fixed with wrong removal operations
- removed temporary storage of remove-positions and replaced by direct deletions
- changed synchronization
- added many assets
- modified dbtest to also test remove during threaded stresstest
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3576 6c8d7289-2bf4-0310-a012-ef5d649a1542
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
such an entry cannot be instantiated without allocation of new byte[]; instead
it can re-use memory from other kelondroRow.Entry objects.
during bugfixing also other bugs may have been solved, maybe the INCONSISTENCY problem
could have been solved. One cause can be missing synchronization during bulk storage
when a R/W-path optimization is done. To test this case, the optimization is currently
switched off.
More memory enhancements can be done after this initial change to the allocation scheme.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3536 6c8d7289-2bf4-0310-a012-ef5d649a1542
to replace byte[] objects within kelondro. Frequent System.arraycopy are common when
kelondroRow.Entry objects are handled. This class may be used to prevent this.
However, experimental replacement of byte[] by kelondroByteArray in kelondroRow.Entry
resulted in complete re-write of large parts of kelondro. This experiment did not
completely lead to a result, because then the interface to kelondro had to be changed
also from byte[] to kelondroByteArray, which may have caused a rewrite of large parts
of YaCy. The experiment is therefore abanonded, but this class remains here without
any function but possibly for future use.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3531 6c8d7289-2bf4-0310-a012-ef5d649a1542
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
- each cache can now allocate as much memory as is available
- no more fixed limits
- replaced old performance memory monitor by new one
- added supervision methods as static functions into the classes that provide cache functionality
- steering of ram allocation is done with two simple limits that are ram availability-relative
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542
too large collection arrays are now avoided. By default, the biggest
collection index is 7. larger collections are dumped into a commons
directory, but cannot yet be used. Bevore doing a dump, the collection
is splittet into a part which has only root-references, and stored back
to the collection; the remaining part goes to commons
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3426 6c8d7289-2bf4-0310-a012-ef5d649a1542