- Fixes issue #160 : handle properly syntax exceptions with a user
friendly message
- Fixes loss of information on multiple blacklist entries editions
- Fixes loss of entries when moving entries from one list to another
Previously, when clicking a selected facet in the search results page to
unselect it, all other eventually selected modifiers/facets were also
removed.
With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
This adds the possibility for the HTML parser to gather typed items URLs
annotated in HTML tags with itemscope and itemtype attributes (see
microdata specification https://www.w3.org/TR/microdata/ ), notably
Types from the schema.org vocabulary, but also Types/Classes from any
other vocabulary, such as the common ones listed in the RDFa core
context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).
Recrawl default profile was previously effectively used for crawl
stacker acceptance check, but request entries were indeed still created
with the "snippetGlobalText" profile.
Associate cached content to the last redirection location, instead of
the first URL of a redirection(s) chain :
- for proper base URL processing in parsers (fixes mantis 636 -
http://mantis.tokeek.de/view.php?id=636)
- to prevent duplicated content in Solr index when recrawling a
redirected URL
- with only light constraint on known indexed documents load date, as it
can already been controlled by the selection query, and the goal of the
job is indeed to recrawl selected documents now
- using the iffresh cache strategy
check case insensitive.
As keywords are compared lower case, make sure user input keyword:Key
or keyword:key will be shown as active in facet entry key.
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).
Related graphical controls to be added to user interface.
Required for proper operation when the default system locale is Turkish,
as dottless and dotted i characters have specific case conversion rules
in this language.
Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.
Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.
Not yet implemented for every YaCy protocol operations.
When a crawl is started, a new field to exclude content from scraping is
available. The field can be identified with the class name of div tags.
All text contained in such a div tag where the configured class name(s)
match are not indexed, while the remaining page is indexed.
Upgraded to InetAccessHandler.
Added InetPathAccessHandler extension to InetAccessHandler to maintain
path patterns capability previously available in IPAccessHandler but
lost in InetAccessHandler.
Filtering on IPv6 addresses is now supported.
Support for deprecated pattern formats such as "192.168." and
"192.168.1.1/path" has been removed, but startup automated migration
should convert such patterns eventually present in serverClient.
Allow typing directly internationalized domain names including non ASCII
characters in the search field.
Search is done using the ASCII Compatible Encoding (ACE) representation.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches.
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy.
Https could also eventually be used for other peers communications.
Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
For any relevant URL parts : host name, URL scheme, session ids or
technical parts (see https://url.spec.whatwg.org/#url-writing and
https://tools.ietf.org/html/rfc3986 for current standard references).
Remaining locale sensitive conversion used for detection of URL word
components in urlComps() makes sense but using detected language would
be preferable than using the default system locale.
Required for people using Turkish language as their default system
locale, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
Using current IANA reference list at
https://www.iana.org/domains/root/db .
As for previous update on known generic TLDs list, the generated URL
hashes on these domains stay the same but it improves performance of URL
hash computation for URLs on these domains.
Using current IANA reference list at
https://www.iana.org/domains/root/db
The generated URL hashes on these domains stay the same but performance
is greatly improved as a DNS resolve request is required on URL hash
computation when the TLD part of the host name is unknown.
Hash computation mean time measured on 1541 sample URLs (one on each
TLD) and a computer with a DSL connection : about 230ms before change,
then only 20ms.
Previously named with their ISO 3166-1 country code : this way, when
setting language to "Browser" in ConfigBasic.html, it didn't work
properly when browser preferred language was Chinese or Greek as their
respective language codes are "zh" and "el" (not "cn" and "gr" which are
their country codes)
Also ensure authentication is not lost by Digest timeout when navigating
between index.html and search results page.
This way, running searches with extended features on a remote peer or a
password protected peer works with a regular user (with "Extended
search" rights).
When authenticating on the search page with a user without "Extended
search" rights, it appears as authenticated, but has just its usual
access to the public search features.
Thus allowing a more convenient way (wihout the need to go to the admin
section) to login when searching on your remote or password protected
peer and benefit from extended search features such as Heuristics,
Bookmarking or JavasScript resorting.
Can be disabled using the ConfigSearchPage_p.html.
Resizing JPEG snapshot images through /api/snapshot.jpg failed when
running on OpenJDK, but rendered successfully with a Oracle JDK.
Details in mantis 772 ( http://mantis.tokeek.de/view.php?id=772 ).
Removing any alpha component (useless in snapshot images) from the
rendered resized image solves the issue.
The target image format (jpeg) doesn't support transparency, so the
Html2ImageTest produced unusable black images when ran on a linux
machine having imagemagick package installed.
The SearchEvent listen to changes on each of its navigators, and the
information about their overall state is sent with each fetched search
item (as a "data-nav-generation" attribute). Then the browser can
regularly fetch a fresh version of yacysearchtrailer.html only if
necessary (when that nav-generation value change).