handler.
UrlProxyServlet splits url in parts to pass it on as parameter and
HeaderFramework constructs a url from param parts. This is obsolete if
already created url is used (makes HeaderFramework.getRequestURL obsolete
= removed)
to make all readily available information from the original ServletRequest
available to YaCy servlets (without converting data to internal structures).
The implementation of the common interface allows easier integration of
YaCy servlets with the servlet standard (e.g. shared login service with
the servlet container etc.)
When the peer is behind a reverse proxy providing SSL/TLS encryption,
the rendered absolute URLs should start with https when the user browser
requested https : added limited support to the X-Forwarded-Proto HTTP
header notably provided on Heroku platform.
Also added some unit tests.
If language is set to "browser" the client/user browser language is used to choose from
available translation.
simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French.
! To make a translation/language available you have to activate the language once !
(or manually use the utility class TranslateAll)
In ConfigBasic.html availabel translations are marked green on setting language=Browser
The client language is determined by http header Accept-Language (checked in DefaultServlet)
use directly HttpServletRequest. This is used to get the http protocol version
in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client.
- adjust YaCyProxyServlet and UrlProxyServlet accordingly
- use more http_version constants in headerframework and httpdeamon
- equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST
to split query into multiple parameter on line separator in input query.
e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0"
but allow "url_file_ext_s:jpg OR url_file_ext_s:png" to be unsplitted
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
- add filename to parameter fieldname
- add filecontent to special parameter fieldname$file
(some servlets use this $file parameter)
fix for http://mantis.tokeek.de/view.php?id=542
This was caused by reading via GZIP from close-wait connection an caused
high cpu- and system-loads.
Instat of implementing handling of the RedListener now I found a
timelimeted 'get' "realy" solving this problem.
This was caused by reading via GZIP from close-wait connection an caused
high cpu- and system-loads.
Solved by implementing handling of the RedListener.
This assigns priorities to incoming requests. Higher priority numbers are served before lower.
(disabled by default in defaults/web.xml,
uncomment or copy entry to DATA/Settings/web.xml)
to support the new time parser and search functions in YaCy a high
precision detection of date and time on the day is necessary. That
requires that the time zone of the document content and the time zone of
the user, doing a search, is detected. The time zone of the search
request is done automatically using the browsers time zone offset which
is delivered to the search request automatically and invisible to the
user. The time zone for the content of web pages cannot be detected
automatically and must be an attribute of crawl starts. The advanced
crawl start now provides an input field to set the time zone in minutes
as an offset number. All parsers must get a time zone offset passed, so
this required the change of the parser java api. A lot of other changes
had been made which corrects the wrong handling of dates in YaCy which
was to add a correction based on the time zone of the server. Now no
correction is added and all dates in YaCy are UTC/GMT time zone, a
normalized time zone for all peers.
- if less or equal of 8 facet options are present, they are shown by
default
- if more facet options are present, they are hidden
To view or hide all facets, just click on the facet header bar
calculated from boostfields config, making sure title, description, keywords and content is always searched.
- apply change to solrServlet makes sure every remote query uses at least all locally defined boost fields for search
- apply to local solr search
- simplify select query by using QF defaults
given css class and extends a given vocabulary with a term consisting
with the text content of the html class tag. Additionally, the term is
included into the semantic facet of the document. This allows the
creation of faceted search to documents without the pre-creation of
vocabularies; instead, the vocabulary is created on-the-fly, possibly
for use in other crawls. If any of the term scraping for a specific
vocabulary is successful on a document, this vocabulary is excluded for
auto-annotation on the page.
To use this feature, do the following:
- create a vocabulary on /Vocabulary_p.html (if not existent)
- in /CrawlStartExpert.html you will now see the vocabularies as column
in a table. The second column provides text fields where you can name
the class of html entities where the literal of the corresponding
vocabulary shall be scraped out
- when doing a search, you will see the content of the scraped fields in
a navigation facet for the given vocabulary
new users. Will be activated maybe again if there is an optional
tutorial mode which can be switched on for this special purpose of
running a tutorial.
This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result.
The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules).
Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index.
To be able to improve the local index a Click-Servlet option was added additionally.
If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index.
The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks)
The option check-boxes are placed in ConfigPortal.html
solr to the YaCy built-in solr search servlet. Its not complete and not
fully correct (there is still a utf8 encoding problem) but it is a way
to get easily requests forwarded through YaCy to an external Solr.
according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query).
This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration.
Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields
and does not relay on the duplication of text to text_t.
- add author to reset-default boost fields (support results for author nav)
Filter queries are not actually related to ranking, but user requests
have pointed out that specific boost queries to move results to the end
of the result list are not sufficient. Such boost filters may be better
executed as actual filter and therefore such a filter can now be
statically applied to every search request. A typical use could be the
expression "http_unique_b:true AND www_unique_b:true" which uses the
recently introduced fields http_unique_b and www_unique_b which are true
only for one of the alternatives with/without http(s) and with/without
prefix 'www.' in host names.
The collection field (can be filled i.e. in Crawl Start) can be used to
add categories to YaCy index entries. The usage of that field was
restricted to solr searches and post argument filters as implemented in
commit f7571386a3.
This commit extends collections to a full navigation option in the
standard YaCy search interface. The field is not active by default but
can be activated easily in the /ConfigSearchPage_p.html servlet (just
check the 'Collection' facet field). Collections can now be used for (at
least) two purposes:
- to provide search tenants (through post argument collection)
- to provide self-made category navigation
Search requests may now have (independently from switched on or off
collection facet) a "collection:<collection-name>" modifier attached;
firthermore collection names may use disjunctions using the '|' pipe
symbol. For example, this is a valid search request:
www collection:user|proxy
requests. The internal representation of post-arguments is String and
therefore not appropriate for byte[] object as submitted by file pushes.
Therefore all pushed files are encoded to base64 _after_ uploading with
an http form (you do not need to do that encoding yourself) to hand-over
the byte[] as string in the post argument.
Servlets which read such files must decode the base64 data to get the
original byte[] array.
This is considered as a temporary solution for file uploads and a proper
implementations would need to consider all attributes as handed over as
Objects with either String or byte[] Object instances. This would be a
major code change and is not done at this time here now. The feature was
submitted to realize a feature as pushed with the next commit.
besides adjustments in code it makes the servlet settings in web.xml significant.
This applies to solr, gsa and proxy servlet. There is no longer a default setup in code during init (as jetty 9 checks for double definition).
on proxied pages a link to exit proxy is added to top of page.
Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed.