This makes YaCy easier to configure when running behind a reverse Proxy.
The check on status avoids trying to update the page with error text
content when the server returned a 404 or 500 error message for example.
to work directly with javax.servlet.http.Cookie (rename headerProps to
cookieStore as is only used for this).
(Re)implement set-cookie in DefaultServlet to make cookieAuthentication
work as designed.
When starting a crawl from a file containing thousands of links,
configuration setting "crawler.MaxActiveThreads" is effective to prevent
saturating the system with too many outgoing HTTP connections threads
launched by the crawler.
But robots.txt was not affected by this setting and was indefinitely
increasing the number of concurrently loading threads until most ot the
connections timed out.
To improve performance control, added a pool of threads for Robots.txt,
consistently used in its ensureExist() and massCrawlCheck() methods.
The Robots.txt threads pool max size can now be configured in the
/PerformanceQueus_p.html page, or with the new
"robots.txt.MaxActiveThreads" setting, initialized with the same default
value as the crawler.
It can take any Date field of the index and displays a list of year strings
in reverse order by the year (not the score/count).
To allow to define the index field to use, the fieldname (and title can be
appended to the navi's name "year" e.g. year:load_date_dt:LoadDate
It works also with dates_in_content_dts field (from the graphical date
navigator). Here the query parameter from: to: are used on selection as
Query modifier (for other dates currently no query parameter available, so
selection won't work to filter search results).
Not included in the UI Searchpage layout config so far (for experiment with
it manual change to conf needed).
Upgraded the following JavaScript libraries dependencies :
- bootstrap-switch to 3.3.2
- html5shiv to 3.7.3 and switched to minified version
- typeahead to 0.10.5
- jQuery to 1.12.4
Removed unused bootstratp-rtl.css and bootstrap-rtl.min.css.
Tested non regressions on the following systems :
- Debian Jessie :
- Firefox 45.4.0
- MS Windows 10 :
- Chrome 54.0.2840.99
- Firefox 50.0
- Edge
- Emulated IE 11, 10 and 9
to make all readily available information from the original ServletRequest
available to YaCy servlets (without converting data to internal structures).
The implementation of the common interface allows easier integration of
YaCy servlets with the servlet standard (e.g. shared login service with
the servlet container etc.)
3bcd9d622b
crawler servlet log warning line on failure in one of multiple urls (instead of exception msg)
indexcontrolrwi skip not needed type conversion on ranking
When WikiCode inserted in a peer hosted Blog, Wiki, Messages or Profile
contains relative links (images or any content, hosted in DATA/HTDOCS),
it is more reliable to keep these links relative, especially when the
peer is behind any kind of reverse Proxy.
This is more reliable when YaCy is behind a reverse proxy.
Also updated integration examples to keep the current protocol part
(http or https) in the example address.
the navigator to include counts all matches (rwi+fulltext).
Fixing also unresolved_pattern in navigators title (of the counter)
The use of inurl: query modifier as filter has not been changed keeping
it as soft (unsharp) filter facet.
Upd StringNavigator to prevent empty string form multivalued solr fields,
removed date value conversion (better handled elsewhere, not need here).
Prepared the first basic navigators (for authors and collections) for the
list of SearchEvent.navigatorPlugins and adjusted servlet to use these.
- this allows to configure display order of these navigators (by ordering config string)
- eventually allows for additional and/or custom navigators using any
available index field without need for changing servlets
- the Collection navigation has been adjusted to exclude the internal,
default robot_* and dht collections from displaying
- rwi results are now also checked for navigatior by the refactored navi's
So far no config options were added to customize or add navigators (may
come later if route of upcoming modularization/plugin system is defined).
used for rwi ranking.
Main changes:
- introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access)
- use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null
- adjust assignments and the min() max() and distance() calculation accordingly
Applied strategy : when there is no restriction on domains or
sub-path(s), stack anchor links once discovered by the content scraper
instead of waiting the complete parsing of the file.
This makes it possible to handle a crawling start file with thousands of
links in a reasonable amount of time.
Performance limitation : even if the crawl start faster with a large
file, the content of the parsed file still is fully loaded in memory.
This pages were already no more XHTML 1.0 because made use of the HTML5
syntax and elements.
Applied current (2016) HTML standard recommended Doctype declaration
(see https://www.w3.org/TR/html/syntax.html#the-doctype ).
was using servlet for network access and missing network.unit.name
fix for http://mantis.tokeek.de/view.php?id=694
+ prevent unresoved_pattern in yacy/list servlet
It had incorrect "-UNRESOLVED_PATTERN-" value (see second part of
mantis 691 http://mantis.tokeek.de/view.php?id=691 )
Note : crawlingDomFilterDepth is apparently unused in current (2016)
YaCy code-base. It was also unnecessary because crawlingDomFilterCheck
hidden field is set to "off".
Removed scheme, host and port from URL to avoid dealing with http/https,
external host and port retrieving issues.
What's more, this is consistent with how URL are displayed in
/Tables_p.html?table=api&count=100&reverse=on&search= or
Tables_p.xml?table=api&count=100&search=
This fixes mantis 691 first part
(http://mantis.tokeek.de/view.php?id=691)
This page was already no more XHTML 1.0 as it makes use of the HTML5
<progress> element.
Applied current HTML standard recommended Doctype declaration (see
https://www.w3.org/TR/html/syntax.html#the-doctype ).
These are no valid link relationships, and do not appear to be used in
scripting or styling.
If necessary, a valid alternative could be to add an attribute such as
data-count="[count]"
This file is used by Bootstrap documentation website
(http://getbootstrap.com/) but is not part of the Bootstrap distribution
and has not be included in a Bootstrap based application.
This ensure consistency between the index link and the
opensearchdescription, even when switching language after having added
your YaCy peer to the browser engines list.
- move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics)
- let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)
New or modified translation (via /Translator_p.html) can be shared/distributed
via the YaCy internal news service. Remote peers can see and vote on the
translation via the new http://localhost:8090/TransNews_p.html servlet.
A positive vote will add the received translation to the local translation
list and post a voting message to the news service.
(at this no processing of received votings is implemented)
+ fixed the msg service retention time check (NewsPool.automaticProcessP)
If language is set to "browser" the client/user browser language is used to choose from
available translation.
simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French.
! To make a translation/language available you have to activate the language once !
(or manually use the utility class TranslateAll)
In ConfigBasic.html availabel translations are marked green on setting language=Browser
The client language is determined by http header Accept-Language (checked in DefaultServlet)
use directly HttpServletRequest. This is used to get the http protocol version
in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client.
- adjust YaCyProxyServlet and UrlProxyServlet accordingly
- use more http_version constants in headerframework and httpdeamon
- equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST
We now use the same protocol as the one used to display the config page
: so when using https, the content is not blocked by the browser
detecting mixed-content.
process 1. load default from locales/*.*
2. load and merge(overwrite) from DATA/LOCALE/*.* (can be partial translation as it is merged)
- include all entries from DATA/LOCAL to be edited in Translator servlet
and save just modifications (instead of full list) to DATA/LOCALE
This shall make it easy to share modifications.