*) First version of a sitemap parser added
- currently only autodetection of sitemap files is supported
*) DB-Import restructured
- pause/resume should work again now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542
- cluster definitions can now contain an addition for local ip addresses
- cluster-cluster communication uses the local ip address instead the global address, if one is given
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3624 6c8d7289-2bf4-0310-a012-ef5d649a1542
automatically aquire release information from download archives
web pages from latest.yacy-forum.net and yacy.net are retrieved, parsed,
links wihin are analysed, sorted and the most recent developer and main
releases are provided as direct download link on the status page, if it was
discovered that a more recent version than the current version is available.
This process is done only once during run-time of a peer, to protect our
download archives from DoS by YaCy peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3606 6c8d7289-2bf4-0310-a012-ef5d649a1542
- the network configuration page shows a new option: robinson clusters
- when a global search is made, all robinson peers are excluded, but:
- robinson peers/clusters that provide peer tags and where search words match
such tags, they are included in global search. Therefore, robinson peers/clusters
support the global yacy network with their indexes, without doin DHT-exchange
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3598 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) Marked two deprecated source-points
*) Added possibility to dump words from indexing to file. Should not affect performance in the current form.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3592 6c8d7289-2bf4-0310-a012-ef5d649a1542
- new cluster functions will be available in this menu, but currently not enabled,
because corresponding interface methods are not ready yet
- shifted remote crawl settings to new network configuration menu
- shifted DHT distribution/receive to the new network configuration menu
- adopted some string constants
- added cluster configuration settings to yacy.init
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3589 6c8d7289-2bf4-0310-a012-ef5d649a1542
http://www.yacy-forum.de/viewtopic.php?t=3854
This is a serious problem that is caused by the database bug between 0.511 - 0.513
which produced a large number of double-entries in the RWI index. The uniq()-method
tries to fix this, and it does not terminate when the index is large and the number
of double-occurrences is also large. This patch does simply implement a time-controlled
termination, which does not heal the inconsistency problem. The uniq-method itself
is correct and does not need a bugfix, the non-termination is simply caused by the large number
of data that is shifted during the process. It was possible to reproduce this behaviour
in a test environment.
A real fix would need to:
- enhance the uniq()-method by using a recursive, binary segmentation of the array to be fixed
- uniq() must report the entries that are double
- the double-entries must be deleted from the collection index (from the index and the collections) to heal the problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3583 6c8d7289-2bf4-0310-a012-ef5d649a1542
- re-implemented index load/extend optimization that was removed from kelondroFlexTable,
this is now part of kelondroIntBytesIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3580 6c8d7289-2bf4-0310-a012-ef5d649a1542
- some bugs may have been fixed with wrong removal operations
- removed temporary storage of remove-positions and replaced by direct deletions
- changed synchronization
- added many assets
- modified dbtest to also test remove during threaded stresstest
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3576 6c8d7289-2bf4-0310-a012-ef5d649a1542
* removed divide by zero bug when 20_dhtdistribution_busysleep is 0
* replaced German comment with wrong charset in source/de/anomic/plasma/plasmaCrawlBalancer.java by an English one
* replaced the table-fix for floating behind snipped images by a br with clear
* removed unnecessary old xhtml-files (were not in use, they were created when we weren't having xhtml for testing)
* new layout for image-search results: replaced the old one with spans and tables inside (not valid) with new divs, now each image snippet container has the same size
TODO:
* the ids of the snippetLoading-divs aren't valid because ids must start with an alphabetic letter or an underscore, they have to be prefixed
* in the returned snippet-xml is an unresolved pattern for status (the status is only set for text snippets)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3566 6c8d7289-2bf4-0310-a012-ef5d649a1542
- exclusion on index-level (not only from search snippets)
- exclusion hand-over at remote search protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3556 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) main method is generalization of main method of ymageFontGenerator:
it does not matter how many lines of how many bits a font is made of
as long as the values stay the same within the font -> use this class as
a template for your own font generators and be a happy camper
*) main method checks if font is valid (96 characters, all letters must have
same number of lines and same number of bits per line)
*) ***** I have not checked if the result is really a valid font so far. *****
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3552 6c8d7289-2bf4-0310-a012-ef5d649a1542
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
such an entry cannot be instantiated without allocation of new byte[]; instead
it can re-use memory from other kelondroRow.Entry objects.
during bugfixing also other bugs may have been solved, maybe the INCONSISTENCY problem
could have been solved. One cause can be missing synchronization during bulk storage
when a R/W-path optimization is done. To test this case, the optimization is currently
switched off.
More memory enhancements can be done after this initial change to the allocation scheme.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3536 6c8d7289-2bf4-0310-a012-ef5d649a1542
to replace byte[] objects within kelondro. Frequent System.arraycopy are common when
kelondroRow.Entry objects are handled. This class may be used to prevent this.
However, experimental replacement of byte[] by kelondroByteArray in kelondroRow.Entry
resulted in complete re-write of large parts of kelondro. This experiment did not
completely lead to a result, because then the interface to kelondro had to be changed
also from byte[] to kelondroByteArray, which may have caused a rewrite of large parts
of YaCy. The experiment is therefore abanonded, but this class remains here without
any function but possibly for future use.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3531 6c8d7289-2bf4-0310-a012-ef5d649a1542
this shall be used to store a fragment of the index on another physical device,
to split IO load and enhance access speed. The index is splitted in such a way
that the LURLs are stored to the secondary location, and the RWIs to the primary
location. This is especially useful for environments where symbolic links are
not possible and may cause IO access even if there is no write access to the
device which hosts the symbolic link.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3519 6c8d7289-2bf4-0310-a012-ef5d649a1542
The new collection index will be more generalized to support other indexes
i.e. YBR block-rank computation. A clean-up of the many conditions to support
the old database was necessary.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3506 6c8d7289-2bf4-0310-a012-ef5d649a1542