This is a major step because solr removed support for embedded solr
instances in 9.0 and we want to keep it because we want to ship
YaCy with an embedded solr. It was necessary to add parts of solr
code into YaCy to make this migration possible. Further on with
Solr 9.1 they removed even more parts which are required for embedded
operation, therefore we cannot migrate yet further without big
changes.
If you are running a YaCy instance with Solr 8.x, the migration should
be done automatically. If not you require to first migrate to a YaCy
version 1.93 with Solr 8.x to migrate to Solr 8 data.
RAG (Retrieval Augmented Generation) is a method to combine a search
engine with a LLM (Large Language Model). When a new prompt is
submitted, a search engine injects knowledge from a search into the
content. This is done using a reverse proxy between the Chat Client and
the LLM. In this case, we used the following software:
LLM Backend - Ollama:
https://github.com/ollama/ollama
Install ollama and then load two required LLM models
with the following commands:
ollama pull phi3:3.8b
ollama pull llama3:8b
Chat Client - susi_chat:
https://github.com/susiai/susi_chat
just clone the repository and the open the file
susi_chat/chat_terminal/index.html
in your browser. This displays a chat terminal.
In this terminal, run the following command:
host http://localhost:8090
This sets the LLM backend to your YaCy peer.
Then start YaCy. It will provide the LLM endpoint to the client
while using ollama in the backend. It then injects search results
only from the local Solr index, not from the p2p network (so far).
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
Processing of gzip encoded incoming requests (on /yacy/transferRWI.html
and /yacy/transferURL.html) was no more working since upgrade to Jetty
9.4.12 (see commit 51f4be1).
To prevent any conflicting behavior with Jetty internals, use now the
GzipHandler provided by Jetty to decompress incoming gzip encoded
requests rather than the previously used custom GZIPRequestWrapper.
Fixes issue #249
Relative URLs to CSS stylesheets were not properly rendered when using
the Solr html response writer and the "/solr/collection1/select" entry
point instead of "/solr/select".
SimpleDateFormat must not be used by concurrent threads without
synchronization for parsing or formating dates as it is not thread-safe
(internally holds a calendar instance that is not synchronized).
Prefer now DateTimeFormatter when possible as it is thread-safe without
concurrent access performance bottleneck (does not internally use
synchronization locks).
Solr can provide partial results for example when a processing time
limit (specified with the parameter `timeAllowed`) is exceeded.
Before this fix, getting partial results from an embedded Solr index
resulted in a ClassCastException :
"org.apache.solr.common.SolrDocumentList cannot be cast to
org.apache.solr.response.ResultContext".
- Use the EnhancedXMLResponseWriter only when requested output is "exml"
- Use the Standard Solr writers when possible, for example for json, xml
or javabin output formats
- Return an error when the requested format can not been rendered with
an external Solr server only
Important : this modification is necessary for peers using exclusively
an external Solr server to be reachable as robinson targets in p2p
search, as the binary format ("javabin") is the default Solr exchange
format for peers.
Before this, when a peer requested a remote one attached only to an
external Solr (no embedded one), it ended with "Invalid type" error, as
the remote peer answered with xml although binary format was requested.
Especially for Turkish speaking users using "tr" as their system default
locale : strings for technical stuff (URLs, tag names, constants...)
must not be lower cased with the default locale, as 'I' doesn't becomes
'i' like in other locales such as "en", but becomes 'ı'.
- ensure use of HTTP POST method : HTTP GET should only be used for
information retrieval and not to perform server side effect operations
(see HTTP standard https://tools.ietf.org/html/rfc7231#section-4.2.1)
- a transaction token is now required for these administrative form
submissions to ensure the request can not be included in an external
site and performed silently/by mistake by the user browser
following comment "use of properties as header values is discouraged"
in case where (proxy)HTTPClient overwrites values with supplied url.
Use defined request.referer procedure in response class.
HTTP "Referer" header sent by the browser when using YaCy can now be
controlled either with the referrer meta tag as a global policy, or only
for search result links by adding the attribute rel="noreferrer".
To improve privacy with the less possible regressions, the default is
set as meta tag with value "origin-when-cross-origin" : internal YaCy
links behavior is not affected, but when visiting external websites
referrer url is not empty but stripped from query parameters and path.
Older browsers, Safari, MS IE and Edge do not support the referrer meta
tag, so the standard but less flexible noreferrer link type can also be
enabled as an alternative.
User-friendly settings page to be implemented.
(expected scheme e.g. http, was protocol version).
Depreceate obsolete custom X-...-Scheme header constant.
Use existing FORMAT_ANSIC Dateformatter in HeaderFramework.
Correct htmlParserTest (del one not intended println)