parameters on SSI (server side includes).
Query parameters are already merged by dispatcher.include, making copy
of parameter (RequestDispatcher.INCLUDE_QUERY_STRING) obsolete.
All other parameter are not used as YaCy servlet arguments.
exclude servletPath option as resources are always relative to htroot
or htdocs, the change reflects this.
Theoretically it and the recent adjustments arcording relative urls
allows to configure the instance to be configurable in a path other as
root (/)
The default redirection strategy when using directly HTTPClient is
incorrect when redirection is cross host (the original Host header is
still sent when requesting the redirected location).
YaCy LoaderDispatcher handles redirections properly, thus release
archive files using redirected URLs (such as the URLs on a GitHub
Release page) are successfully downloaded.
When a downloaded archive release is corrupted, empty, or can not be
opened for any reason, the update script must not be launched because it
erases the existing lib/*.jar libraries.
to the client (not cookies only). This is used by some servlets to mainly
set "Access-Control-Allow-Origin" header. Added a contains check to be
sure no header set by Defaultservlet is overwritten.
NullPointerException occurred when using and Identificator instance
which encountered and error in its constructor.
This error could be caused by a missing "langdetect" folder in the
current folder of the main process, or by simultaneous first calls to
the constructor, initializing concurrently the DetectorFactory.langlist.
Fixes the mantis 714 (http://mantis.tokeek.de/view.php?id=714)
Reduced this vocabulary memory usage :
- by using only one map term2entries instead of two maps having the
same key set
- by generating the location object links on the fly using the
GeoLocation data instead of storing many duplicates of string prefix
"http://www.openstreetmap.org/?lat="
Measurements with VisualVM and GeoNames 0 enabled (cities with a
population > 1000) :
- AutotaggingLibrary retained size :
- initial : 309 718 763 bytes
- after refactoring : 159 224 641 bytes
Using String instead of StringBuilder instances in GeonamesLocation
allows to reuse the same immutable objects in the Tagging class.
Measurements with VisualVM and GeoNames 0 enabled (cities with a
population > 1000) :
- OverArchingLocation retained size :
- initial : 164 666 830 bytes
- after refactoring : 97 736 804 bytes
- AutotaggingLibrary retained size :
- initial : 354 713 633 bytes
- after refactoring : 309 718 763 bytes
Could occur when a search request was performed just after peer startup,
and the Switchboard Thread "LibraryProvider.initialize" had completed,
thus requesting a ProbabilisticClassifier not completely initialized
(and having a null contexts property).
Reusing the same geonameid Integer instance between `id2loc` and
`name2ids` maps reduces (a little) memory footprint.
Measured OverarchigLocation class retained memory with VisualVM on
openJDK 8 :
- initial : 183 439 490 bytes
- after refactoring : 164 666 830 bytes
As reported by @reger24, image and favicon viewing was broken with
unauthenticated requests on peers configured to require authentication
even from localhost.
So I unified viewing rights check in a single new function on
ImageViewer class.
to work directly with javax.servlet.http.Cookie (rename headerProps to
cookieStore as is only used for this).
(Re)implement set-cookie in DefaultServlet to make cookieAuthentication
work as designed.
When starting a crawl from a file containing thousands of links,
configuration setting "crawler.MaxActiveThreads" is effective to prevent
saturating the system with too many outgoing HTTP connections threads
launched by the crawler.
But robots.txt was not affected by this setting and was indefinitely
increasing the number of concurrently loading threads until most ot the
connections timed out.
To improve performance control, added a pool of threads for Robots.txt,
consistently used in its ensureExist() and massCrawlCheck() methods.
The Robots.txt threads pool max size can now be configured in the
/PerformanceQueus_p.html page, or with the new
"robots.txt.MaxActiveThreads" setting, initialized with the same default
value as the crawler.
It can take any Date field of the index and displays a list of year strings
in reverse order by the year (not the score/count).
To allow to define the index field to use, the fieldname (and title can be
appended to the navi's name "year" e.g. year:load_date_dt:LoadDate
It works also with dates_in_content_dts field (from the graphical date
navigator). Here the query parameter from: to: are used on selection as
Query modifier (for other dates currently no query parameter available, so
selection won't work to filter search results).
Not included in the UI Searchpage layout config so far (for experiment with
it manual change to conf needed).