Previously search navigators/facets elements were sorted only by counts.
Now from the ConfigSearchPage_p.html admin page, sort direction
(ascending/descending) and type (on counts or labels) can be customized
independently for each navigator.
- set the chunksize to 100 to meet the max of the embedded solr
- re-enable sorting (the case where we switched it of should be away)
- enable recrawling on remote-solr
On some conditions (especially when reaching timeout), concurrent Solr
query tasks used by the /HostBrowser.html and /api/linkstructure.json
never terminated, thus leaking resources, as reported by @Vort in issue
#246
New "Media Type detection" section in the advanced crawl start page
allow to choose between :
- not loading URLs with unknown or unsupported file extension without
checking the actual Media Type (relying Content-Type header for now).
This was the old default behavior, faster, but not really accurate.
- always cross check URL file extension against the actual Media Type.
This lets properly parse URLs ending with an apparently odd file
extension, but which have actually a supported Media Type such as
text/html.
Sample URLs with misleading file extensions added as documentation in
the crawl start page.
fixes issue #244
Not using the JDK URLDecoder.decode() function, as it strips '+'
characters when they occur after '?' (both characters having regular
expression semantics when used in blacklist path patterns)
Normalize blacklist path patterns using percent-encoding, at pattern
edition in web interface and at loading from configuration files.
Fixes issue #237
- To meet current browsers security rules, which prevent selecting a
full file path with an html input field of type 'file'
- As it does not make sense to select a local file path when a the
administered YaCy server is remote (not on the same computer as the
browser)
Considering that the sliders usage on that page is very basic, using
standard HTML5 inputs of type "range" has here the following advantages
:
- better keyboard accessibility
- remove not very necessary additional jquery dependencies
Today browsers suport for range inputs is good, and even on old
unsupporting browsers such as IE < 10 they nicely fall back to text
inputs.
When searching images, thumbnails that could not be rendered (because of
a load error such as HTTP 404, networking issue or an internal error on
the rendering servlet) are now hidden as default. But can be revealed
with a button if desired.
Fix for issue #217
Manually replacing '+' character or "%20" by a space character in the
search query parameter was necessary in YaCy a long time ago to properly
decode application/x-www-form-urlencoded format (commit
9842fab6e4 in 2010).
Since the introduction of Jetty as the embedded HTTP server (commit
4b77733e59 in 2013), this is no more
necessary as Jetty internals already do this for us in
org.eclipse.jetty.util.UrlEncoded.decodeUtf8To().
So we can remove now this duplicated decoding as it prevents a proper
use of the '+' character in search requests, as reported in issue #216.
The status of the library in the DictionaryLoader_p.html page now also
advertises the user that an upgrade can be applied when an older dump is
already loaded.
Upgrade applied as suggested by Niklas Andrus @fapth_gitlab on Gitter
chat.
SimpleDateFormat must not be used by concurrent threads without
synchronization for parsing or formating dates as it is not thread-safe
(internally holds a calendar instance that is not synchronized).
Prefer now DateTimeFormatter when possible as it is thread-safe without
concurrent access performance bottleneck (does not internally use
synchronization locks).