on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
This can be used to add another stemming to solr using stemming files
that are expressed as synonyms for grammatical alternatives. The
synonym/stemming files must have the following form:
- each line is a comma-separated list of synonyms
- the list of synonyms may be enclosed with {} (like the GSA synonyms
file)
- the file may contain comments which are lines starting with a '#'
The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and
are activated by default whenever a synonym file is in place.
Then, for each word that is found in a document all synonyms are added
to a long text field which is stored into synonyms_t. Processes using
the synonyms must query with that field as optional matcher.
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
solr and the gsa search interface. That means that all JavaScript in
browsers now can Cross-Origin access all YaCy search interfaces, which
opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over'
concepts.
Therefore a property like solrConnected() must be true all the time.
Removal of this method causes removal of all write operations to the old
metadata index.
create an document entry. This makes remote search much slower.
- removed synchronization of add method if ip_s is activated to prevent
that a user configuration causes bad behavior. The disadvantage of that
is, that a index dump can cause data loss if an indexing is running
during index dump
- catched more exceptions and more NPE
- better abstraction in MirrorSolrConnector
- slight performance enhancement when only the index count is requested
(rows=0 is sufficient to get a total count)
dump commands
- adjusted the apicall.sh script to get the downloaded text as output to
stdout which is necessary to parse the content out of it
- added indexdump.sh script which creates a solr dump and prints out the
storage path for the index dump
- added synchronization to the Fulltext class to prevent that data is
stored to a non-existing solr index while this index is disabled during
the storage of the dump
structures in cora:
- added the Peer object, which is a fresh version of Seed
- added the Peers object, which is a fresh version of Network
- added the Network api access class to retrieve a list of peers based
on the Network.xml servlet in all YaCy peers.
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
- a solr field collection_sxt can be used to store a set of crawl tags
- when this field is activated, a crawl tag can be assigned when crawls
are started
- the content of the collection field can be comma-separated, all of
them are assigned to the documents when they are indexed as result of
such a crawl start
- a search result can be drilled down to a specific collection; this is
currently only available in the solr interface and also in the gsa
interface using the 'site' option
- this adds a mandatory field for gsa queries (the google api demands
that field all the time)
- reduced danger that a non-existing RWI database causes NPEs
- added Solr queries to did-you-mean: this makes it possible that our
did-you-mean algorithm works together with only Solr and without RWIs
title_count_i, title_chars_val, title_words_val
description_count_i, description_chars_val, description_words_val
- added many asserts to ensure data type correctness from YaCy to Solr
and vice versa
- made many fixes according to new findings from these asserts (!)
- fixed type definition found by the verifier
- added multivalue-string fields for solr with extension 'sxt'
- added multivalue-integer fields for solr with extension 'val'
- renamed some solr attributes from txt to sxt
- changed solr query line to an explicit AND/OR structure
- added a country code second level domain list to Domains class; with
parser
- added a host string parser to get domain class name, country-code
second-level domain and subdomain out of it
- removed old coordinate attributes