- moved some servlets from here to there..
- changed menu structure
- removed yacyui-portaltest.html which contained an example for the live search which is now integrated on all pages in yacy. The code snippet example from that page is integrated into the ConfigLiveSearch.html servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5994 6c8d7289-2bf4-0310-a012-ef5d649a1542
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
- after a index selection is made, the index is splitted into its vertical components
- from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue
- each splitted chunk gets its own transmission thread
- multiple transmission threads are started concurrently
- the process can be monitored with the blocking queue servlet
To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed.
The new index distribution model using a vertical DHT was implemented. An abstraction of this model
is implemented in the new dht package as interface. The freeworld network has now a configuration
of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free.
This modification has three main targets:
- enhance the DHT transmission speed
- with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times.
- the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before.
with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs.
BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542
- implemented vertical DHT acceptance ("my own DHT") to accept new targets
- added new target computation for global search: addresses vertical targets also
- enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries)
- better performance value computations for PPM selection in network configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542
- two different computations (but mathematical equivalent) of the DHT distance had been consolidated
- moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets
- added fast Long - to - hash computation
- high-precision target computation of gaps for new peers
- added new target computation for horizontal and vertical DHT targets (not yet in use)
- old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
- select also non-robinson (dht-) peers if their peer tags match with search words
- the peer tag '*' can now act as catch-all rule: shall be selected always
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4963 6c8d7289-2bf4-0310-a012-ef5d649a1542
- prevention of false information of own IP address
- enabled searching before an own IP address is assigned (before first ping happened)
- removed warning about limited search function
- added better time-out settings for peer-ping process (10 seconds complete, 5 seconds for back-ping)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4883 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fixed loosing of own seed hash (hopefully)
- fixed a bug with crawl start s beginning with (bookmark) files
- added better IP recognition during hello process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4882 6c8d7289-2bf4-0310-a012-ef5d649a1542
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542