used for rwi ranking.
Main changes:
- introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access)
- use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null
- adjust assignments and the min() max() and distance() calculation accordingly
during rwi search result processing worddistance calculation is effected
by concurrent update (normalization) of min/max ranking parameter for
wordpositions. On update of min/max the exception is raised in distance calc
and now catched.
This concurrent update and change of ranking results is needed for speed
but should be further checked for optimization
(again, description in http://mantis.tokeek.de/view.php?id=698)
as root cause was not seen, added just workaround reducing in favour over a
try catch (for easier followup).
Issue was the calculation in AbstractReference with positions.clear() call,
this made distance result always 0 (distance needs min 2 positions) and created concurrency issues.
+ unit test of changes
This is needed and enables existing word position ranking for RWI.
The upcoming concurrency issue in word position min/max calculation were eliminated
by iterator.hasHext check before next() access.
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.
The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.
To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542