The resource observer is now able to recognize free disk space AND
available space for YaCy. The amount of space which is assigned for YaCy
are defined in new settings in the configuration file.
Furthermore, there is now a cleanup process which deletes files in case
that an autodelete is activated. The autodelete is now BY DEFAULT ON if
the disk space is low, which means that YaCy starts to delete documents
when the disk is full!
occupied disc space. These values are also shown on the status page.
The disc space calculation shall be used for a disk-limitation of the
search index.
if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB
RAM available). The memory amount of the postprocessing is the cause
that systems block because they run into a frequent-GC chain which
almost locks the peer. If running with enough memory, the postprocessing
is fast and not damaging to the system.
Because the required RAM of 0.5 GB is never available in default
setting, the postprocessing will not run if the peer is not reconfigured
to use more memory.
introduced, it was also used for search facets. The generic search
facets are now deduced from generic solr fields which makes jena as tool
for facet semantics superfluous.
- redesigned the instance mirror class (which was a mess)
- added final method to close a searcher (which otherwise keeps a cache)
- changed cache clear method which iterates over resources and calls
clear to all caches in the searcher resources
- selecting more than one nav combines the 2 selections (with AND)
- unselecting one nav clears all selected
(e.g. select filetype:pdf and /language/fr shows ~ french pdf's only)
works fine to restrict language for local solrSearches.
More work needs to be done to make rwi/remote searches respect the modifier.language restriction.
- refactored all code which uses URIMetadataRow as standard for word
hash length and word hash ordering and moved that to the class 'Word',
becuase the class URIMetadataRow defined the old metadata data structure
and should be superfluous in the future
- removed unused methods from URIMetadataRow as preparation for further
removal of that class
- since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic,
adjusted ConfigHeuristic to use OpensearchHeuristic settings only.
For this the default OSD search target list is made available (copied) by default and the other configs are removed.
- the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object,
but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns
just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers)
- started to adjust internal html href references from absolute to relative (currently it is mixed).
For future development we should prefer relative href targets (less trouble with context aware servlets)
request into a separate thread and ignores the furthure result of a
request if that does not answer within the requested time-out. This is a
try to solve a problem with the peer-ping, which hangs whenever a peer
appears to be dead or blocked.
as BASIC were pwd is transmitted near clear text (B64enc).
This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST.
!!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash
- default authentication is still BASIC
- configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method>
- the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI
- fyi: the realmname is shown on login screen
- changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin)
- implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST
- to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )
- all non-dht targets (previously separated into 'robinson' for dht-like
queries and 'node' for solr queries) are non 'extra' peers, which are
queries using solr
- these extra-peers are now selected using a ranking on last-seen,
peer-tag-matches, node-peer flags, peer age, and link count. The ranking
is done using a weight and a random factor.
- the number of extra peers is 50% of the dht peers
- the dht peers now exclude too young peers to prevent bad results
during strong growth of the network
- the number of dht peers (and therefore extra-peers) is reduced when
the memory of the peer is low and/or some documents still appear in the
indexing-queue. This shall prevent a peer from deadlocks when p2p
queries are made in a fast sequence on weak hardware.
causes Solr error (and wordindex likely finds suggestion)
org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'text_t:""d"': Lexical error at line 1, column 12. Encountered: <EOF> after : ""
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:179)
at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector$DocListSearcher.<init>(EmbeddedSolrConnector.java:345)
at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getCountByQuery(EmbeddedSolrConnector.java:364)
at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getCountByQuery(MirrorSolrConnector.java:326)
at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getCountByQuery(ConcurrentUpdateSolrConnector.java:440)
at net.yacy.search.index.Segment.getWordCountGuess(Segment.java:464)
at net.yacy.data.DidYouMean.getSuggestions(DidYouMean.java:181)
at suggest.respond(suggest.java:73)
- the admin user name can be configured, in apiExec calls the default "admin" username is used.
TODO: the bin/apicall.sh script should likely take that into account.
as path for solr index dumps (instead of the SEGMENTS path). This will
make a maintenance of index backups easier. It will also provide a tool
to migrate from an freeworld index to a webportal index.
execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start,
but it's more robust to place the call after http is assigned to switchboard/serverSwitch.
- added default filename filter to select field (as only addition to *.black list is permanent)
- modified Blacklist_p header/legend to show all active blacklists
(to support understanding that all configured lists are active)
- removed obsolete code in Blacklist_p servlet