Upgraded the following JavaScript libraries dependencies :
- bootstrap-switch to 3.3.2
- html5shiv to 3.7.3 and switched to minified version
- typeahead to 0.10.5
- jQuery to 1.12.4
Removed unused bootstratp-rtl.css and bootstrap-rtl.min.css.
Tested non regressions on the following systems :
- Debian Jessie :
- Firefox 45.4.0
- MS Windows 10 :
- Chrome 54.0.2840.99
- Firefox 50.0
- Edge
- Emulated IE 11, 10 and 9
to make all readily available information from the original ServletRequest
available to YaCy servlets (without converting data to internal structures).
The implementation of the common interface allows easier integration of
YaCy servlets with the servlet standard (e.g. shared login service with
the servlet container etc.)
3bcd9d622b
crawler servlet log warning line on failure in one of multiple urls (instead of exception msg)
indexcontrolrwi skip not needed type conversion on ranking
When WikiCode inserted in a peer hosted Blog, Wiki, Messages or Profile
contains relative links (images or any content, hosted in DATA/HTDOCS),
it is more reliable to keep these links relative, especially when the
peer is behind any kind of reverse Proxy.
This is more reliable when YaCy is behind a reverse proxy.
Also updated integration examples to keep the current protocol part
(http or https) in the example address.
the navigator to include counts all matches (rwi+fulltext).
Fixing also unresolved_pattern in navigators title (of the counter)
The use of inurl: query modifier as filter has not been changed keeping
it as soft (unsharp) filter facet.
Upd StringNavigator to prevent empty string form multivalued solr fields,
removed date value conversion (better handled elsewhere, not need here).
Prepared the first basic navigators (for authors and collections) for the
list of SearchEvent.navigatorPlugins and adjusted servlet to use these.
- this allows to configure display order of these navigators (by ordering config string)
- eventually allows for additional and/or custom navigators using any
available index field without need for changing servlets
- the Collection navigation has been adjusted to exclude the internal,
default robot_* and dht collections from displaying
- rwi results are now also checked for navigatior by the refactored navi's
So far no config options were added to customize or add navigators (may
come later if route of upcoming modularization/plugin system is defined).
used for rwi ranking.
Main changes:
- introduce a posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access)
- use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null
- adjust assignments and the min() max() and distance() calculation accordingly
Applied strategy : when there is no restriction on domains or
sub-path(s), stack anchor links once discovered by the content scraper
instead of waiting the complete parsing of the file.
This makes it possible to handle a crawling start file with thousands of
links in a reasonable amount of time.
Performance limitation : even if the crawl start faster with a large
file, the content of the parsed file still is fully loaded in memory.
This pages were already no more XHTML 1.0 because made use of the HTML5
syntax and elements.
Applied current (2016) HTML standard recommended Doctype declaration
(see https://www.w3.org/TR/html/syntax.html#the-doctype ).
was using servlet for network access and missing network.unit.name
fix for http://mantis.tokeek.de/view.php?id=694
+ prevent unresoved_pattern in yacy/list servlet
It had incorrect "-UNRESOLVED_PATTERN-" value (see second part of
mantis 691 http://mantis.tokeek.de/view.php?id=691 )
Note : crawlingDomFilterDepth is apparently unused in current (2016)
YaCy code-base. It was also unnecessary because crawlingDomFilterCheck
hidden field is set to "off".
Removed scheme, host and port from URL to avoid dealing with http/https,
external host and port retrieving issues.
What's more, this is consistent with how URL are displayed in
/Tables_p.html?table=api&count=100&reverse=on&search= or
Tables_p.xml?table=api&count=100&search=
This fixes mantis 691 first part
(http://mantis.tokeek.de/view.php?id=691)
This page was already no more XHTML 1.0 as it makes use of the HTML5
<progress> element.
Applied current HTML standard recommended Doctype declaration (see
https://www.w3.org/TR/html/syntax.html#the-doctype ).
These are no valid link relationships, and do not appear to be used in
scripting or styling.
If necessary, a valid alternative could be to add an attribute such as
data-count="[count]"
This file is used by Bootstrap documentation website
(http://getbootstrap.com/) but is not part of the Bootstrap distribution
and has not be included in a Bootstrap based application.
This ensure consistency between the index link and the
opensearchdescription, even when switching language after having added
your YaCy peer to the browser engines list.
- move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics)
- let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)
New or modified translation (via /Translator_p.html) can be shared/distributed
via the YaCy internal news service. Remote peers can see and vote on the
translation via the new http://localhost:8090/TransNews_p.html servlet.
A positive vote will add the received translation to the local translation
list and post a voting message to the news service.
(at this no processing of received votings is implemented)
+ fixed the msg service retention time check (NewsPool.automaticProcessP)
If language is set to "browser" the client/user browser language is used to choose from
available translation.
simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French.
! To make a translation/language available you have to activate the language once !
(or manually use the utility class TranslateAll)
In ConfigBasic.html availabel translations are marked green on setting language=Browser
The client language is determined by http header Accept-Language (checked in DefaultServlet)
use directly HttpServletRequest. This is used to get the http protocol version
in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client.
- adjust YaCyProxyServlet and UrlProxyServlet accordingly
- use more http_version constants in headerframework and httpdeamon
- equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST
We now use the same protocol as the one used to display the config page
: so when using https, the content is not blocked by the browser
detecting mixed-content.
process 1. load default from locales/*.*
2. load and merge(overwrite) from DATA/LOCALE/*.* (can be partial translation as it is merged)
- include all entries from DATA/LOCAL to be edited in Translator servlet
and save just modifications (instead of full list) to DATA/LOCALE
This shall make it easy to share modifications.
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
this includes storing manually downloaded translation files in DATA as well
(to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
- in intranet mode getip returns null causing a NPE
- adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki
+ correct javadoc for seed.getIP()
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
of minutes in the past and reverted latest change. The export file dump
will now contain four data elements: f - first date of index entry write
date, l - last date of index write date, n - now-date of index dump
time, c - count of numbers inside the dump. '0N' denotes a series of
changes which will lead to the opportunity to exchange index data dumps
in a way that is needed to integrate ZeroNet index data. This will be
based on index dump sharing; that causes this commit.
- Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).
Main image processing is now in ImageViewer, used by both ViewImage and
ViewFavicon.
Fixed URIMetadataNode.getFavicon to use non-standard icons with no size
ass fallback.
taking the existing limits into account but make it consistent with search option screen for admin and public user
changes:
- configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit)
- otherwise requests are limited to 100 results per page ( = search option, index.html)
(this basically is the major change, inc. limit from 20 to 100 for public user)
P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being)
see http://mantis.tokeek.de/view.php?id=627
! until better solution is found !.
Reason: in IE-Browser no individual favicon is displayed with <object> tag
(always the default) and only few individual fav's with Firefox (randomly)
hint: to be able to use <img src=ViewImage/> return of default icon was
added back to it.
relates to http://mantis.tokeek.de/view.php?id=629
(limit is applied in yacysearch & max. total results by sum result-stack size)
- remove obsolete search.navigation prop (has moved to ConfigSearchPage_p)
element "totalResults" is included twice (at begin & end),
only the element after performing the search holds number > 0
see http://mantis.tokeek.de/view.php?id=608
- moved default favicon processing from ViewImage to
yacysearchitem.html : when previewing ico image search results we don't
want a default favicon be displayed
- throw an IOException ending in a HTTP 500 error when image processing
fails, rather than returning a null result : behavior is more consistent
accross browsers (for exempla Chrome and Firefox), especially with new
default favicon display system
add url parameter to make sure no parameter are included in refresh url
which could cause unwanted restart of import job
see http://mantis.tokeek.de/view.php?id=591 comments
use getAttributes() to get query parameters as clear text (w/o url encoding)
use getSearchpartMap() to get in internal format (url encoded)
fix for http://mantis.tokeek.de/view.php?id=606
as simple string and no more as regular expressions.
Updated all locale files to adapt to refectored Translator : removed
useless escaped characters and did minor corrections.
Performed minor syntax corrections on some html source files.
Added an util to translate all html source files with all locales
without launching full YaCy application.
Corrected main arguments parsing on other translation utils.
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-includehttps://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.
New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".
To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.
Then, each new document is classified to match within one of the given
categories for each context.
When the sidebar on search page becomes scrollable, the scrollbar shrinks the sidebar and makes the search results weirdly scrollable on X axis by several pixels. Now the sidebar always have a scrollbar, and results are never X-scrollable.
index has more than two million documents. This protects the index from
beeing flooded with search requests that cannot be resolved before the
real search query has to be computet.
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1
Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
reason: experimental implementatin of RDFa parser not executed (limited to special urls) but may cause error on normal html parsing due to a inputstream.reset
servlet and moved it to a new /IndexExport_p.html servlet. This servlet
is now more prominent linked in the main menu under Production -> Index
Export/Import
export function is also now the default export option. The export file
format for a full solr export is very similar to a solr search result
xml, only the <lst name="responseHeader"> tag is missing.
The exported xml has a special line termination feature: all documents
will be exported into a single line without any CR in between. That
means that every document is completely inside a single line. While this
is not readable at all for humans, it is very useful for linux line
processing scripts, like grep. Using grep it will be easy to select
single documents which match for a given pattern.
Such dumps shall be importable with the DATA/SURROGATE/in import
function, but that import is not yet adopted to the new file format.
by instatiation from URIMetadataNode, by eleminating differentiation of ResultEntry/URIMetadataNode.
- moved remaining ResultEntry functionallity to URIMetadataNode
- for 1:1 functionallity added a function makeResultEntry()
- removed ResultEntry
- refactored related code
Main difference is after makeResultEntry the text_t content is removed and alternative title/url strings for display are calculated.
Main difference left is, that
If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread.
Deploy of the remoteCrawlJob deferred on activation of the option.
added to IndexReIndexMonitor_p.html
Selects existing documents from index and feeds it to the crawler.
currently only the field fresh_date_dt is used determine documents for recrawl (fresh_date_dt:[* TO NOW-1DAY]
Documents are added in small chunks (200) to the crawler, only if no other crawl is running.