- added BEncodedHeap class that encodes B data structures and stores that to a heap
- refactoring of MapView, this is now named MapHeap to fit into the naming scheme of the BEncodedHeap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6579 6c8d7289-2bf4-0310-a012-ef5d649a1542
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) refacored BlacklistCleaner_p.java for better readability
*) moved check of validity of patterns to the Balcklist implementation since patterns might be valid in one implementation, but not in another
*) added method to check validity to Blacklist interface
*) fixed some minor issues like typos or wrong whitespaces
*) set subversion properties for a whole bunch of files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6359 6c8d7289-2bf4-0310-a012-ef5d649a1542
- limit comparisment to only the first 10 elements that had been sorted before without IO
- added a size cache to index computation because the size is computed at least twice in set comparator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6306 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added caption to maps containing latitude and longitude information
- prevented that maps occur on second search page
- added location names to did-you-mean
- some refactoring of did-you-mean
- added equal and compareTo test to Coordinates class to make that work in set
- fixed utf-8 support for library files
- fixed a bug in images search icon view caption
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6294 6c8d7289-2bf4-0310-a012-ef5d649a1542
the must be downloaded manually by the user and placed in DATA/DICTIONARIES/source
for each externally imported dictionary file there will be a translator that converts the input file once
into a YaCy-internat data format.
Files that will be provided together with yacy releases may still be placed in <root>/dictionaries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6286 6c8d7289-2bf4-0310-a012-ef5d649a1542
- some redesign of DidYouMean that was necessary to follow
a special rule how a library should be used:
- the library provides words that start or end with a test
word which may be possibly also an empty set of words
- all words that the DidYouMean produced with the four
production rules are used to generate a set of
library-completed words
- if this process results in any words from the library,
only library-genrated words are taken
- if the is no library-generated word at all, take the
artifial generated word
- all words that result from these rules are tested against
the index
- the result is ordered using a lightweight comparator that
prefers short words
- a not-so-much-io test against the index is beeing prepared
next
- insered the library initialization into the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6284 6c8d7289-2bf4-0310-a012-ef5d649a1542
which can be used by any other classes to support their functions.
libraries are designed in such a way that users can create and
insert their own library files, but can also be imported from
other sources. As an example the "Korpusbasierte Wortgrundformliste
DeReWo des Institut für Deutsche Sprache" from
http://www.ids-mannheim.de has been integrated. This dictionary
is licensed to be used for all non-profit purposes. In case that
YaCy is used for commercial uses, this library must be removed.
The new library provilder reads the original source and translates
it into a simple word list to be used for the did-you-mean library
provider. More libraries may be provided in the future using
a download-servlet which puts files from the internet into the
<application-root>/dictionaries/ path.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6282 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better control to the cache by using combined request-header and content access methods
- refactoring of many classes to comply to this new access method
- make shure that the cache is always written if something was loaded
- some redesign of the process how http response results are feeded into the new indexing queue
- introduction of a cache read policy:
* never use the cache
* use the cache if entry exist
* use the cache if the proxy freshness rule confirmes
* use only the cache and go never online
- added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step.
- set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule'
- enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542