- added some missing increments from RWI results
- decrement relevant navigator counts when solr or RWI results are
evicted because duplicates detection or constraints checked belatedly
- do not compute facets when unnecessary to avoid unwanted CPU load
- do not increment from facets when already done
- do not rely on facets on remote solr peers requests, as most of the
time only a limited part of their total results if fetched (thus also
preventing unnecessary load on remote peers)
- use a concurrency friendly score map for the dates navigators to
prevent unwanted ConcurrentModificationExceptions
This improves the situation for the most obvious inconsistencies in
search navigators counts, but more has to be done for a true accuracy
(notably when query modifiers constraints are applied belatedly - after
the solr or RWI retrieval request - such as the content domain
constraint)
As reported by davide on YaCy forums (
http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6004 ) when the
system is on high load, unless reading carefully YaCy configuration
file, it could be difficult to understand why remote search results are
not fetched.
Especially for Turkish speaking users using "tr" as their system default
locale : strings for technical stuff (URLs, tag names, constants...)
must not be lower cased with the default locale, as 'I' doesn't becomes
'i' like in other locales such as "en", but becomes 'ı'.
As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
settings page includes:
- a new setting to control remote Solr responses encoding
- some existing debug settings which could not be set through the admin
user interface
On timeout, closing remote Solr requests is proper than simply using
Thread.interrupt() that is not effective in most cases. Closing does not
ask commit on remote solr, but release http connections resources and is
more likely to end those threads that can else wait indefinitely.
Other related improvements included :
- no more marking remote peer as not available when remote search is
interrupted before timeout by the cleanup job.
- added a short fine log level trace of failing remote solr requests
This new "documentStructure" parameter can be set to false to only get
hosts accumulated references on a resource and thus prevent scraping the
specified URL and getting citations references.
Also set WebStructureGraph constants as final and updated the Javadoc
with example api call URLs.
As described in mantis 721 (http://mantis.tokeek.de/view.php?id=721)
WatchWebStructure_p.html failed to include in its structure view https
and other protocols and ports than default http.
As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720),
when requesting this API with a domain name instead of a complete URL
only HTTP references on default port were listed.
This ensure consistent implementation of the url host hash generation
and easier usage finding in source code.
Also added a unit test for this function.
The default redirection strategy when using directly HTTPClient is
incorrect when redirection is cross host (the original Host header is
still sent when requesting the redirected location).
YaCy LoaderDispatcher handles redirections properly, thus release
archive files using redirected URLs (such as the URLs on a GitHub
Release page) are successfully downloaded.
When a downloaded archive release is corrupted, empty, or can not be
opened for any reason, the update script must not be launched because it
erases the existing lib/*.jar libraries.
used for rwi ranking.
Main changes:
- introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access)
- use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null
- adjust assignments and the min() max() and distance() calculation accordingly
New or modified translation (via /Translator_p.html) can be shared/distributed
via the YaCy internal news service. Remote peers can see and vote on the
translation via the new http://localhost:8090/TransNews_p.html servlet.
A positive vote will add the received translation to the local translation
list and post a voting message to the news service.
(at this no processing of received votings is implemented)
+ fixed the msg service retention time check (NewsPool.automaticProcessP)
(by using the resultcontainer.size instead of input docList.size)
skip waiting for write-search-result-to-local-index
(by removing the Thread.join - which will bring a small performance increase)
- in intranet mode getip returns null causing a NPE
- adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki
+ correct javadoc for seed.getIP()