YaCy '#[clientname]#': URL Database Administration

#(reload)#::#(/reload)# YaCy '#[clientname]#': URL Database Administration #%env/templates/metas.template%# #%env/templates/header.template%# #%env/templates/submenuIndexImport.template%#

Index Export

The local index currently contains #[ucount]# documents, only #[ucount200]# exportable with status code 200 - the remaining are error documents.

#(lurlexport)#::

Loaded URL Export

Export Path

URL Filter

.*.* (default) is a catch-all; format: java regex

query

*:* (default) is a catch-all; format: :

maximum age (seconds)

-1 = unlimited -> no document is too old

maximum number of records per chunk

if exceeded: several chunks are stored; -1 = unlimited (makes only one chunk)

Export Size

full size, all fields: minified; only fields sku, date, title, description, text_t

Export Format

Full Data Records:: JSON (Rich and full-text Elasticsearch data, one document per line in one flat JSON file, can be bulk-imported to elasticsearch. Here is an example for opensearch, using docker:
Start docker container of opensearch:
docker run --name opensearch -p 9200:9200 -d -e OPENSEARCH_JAVA_OPTS="-Xms2G -Xmx2G" -e discovery.type=single-node -e DISABLE_SECURITY_PLUGIN=true -v $(pwd)/opensearch_data:/usr/share/opensearch/data opensearchproject/opensearch:latest
Unblock index creation:
curl -X PUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.blocks.create_index": null } }'
Create the search index:
curl -X PUT "http://localhost:9200/collection1/yacy"
Bulk-upload the index file:
curl -XPOST "http://localhost:9200/collection1/yacy/_bulk?filter_path=took,errors" -H "Content-Type: application/x-ndjson" --data-binary @yacy_dump_XXX.flatjson
Make a search, get 10 results, search in fields text_t, title, description with boosts:
curl -X POST "http://localhost:9200/collection1/yacy/_search" -H 'Content-Type: application/json' -d' {"size": 10, "query": {"multi_match": { "query": "one two three", "fields": ["text_t", "title^10", "description^3"], "fuzziness": "AUTO" }}}'
XML (Rich and full-text Solr data, one document per line in one large xml file, can be processed with shell tools, can be imported with DATA/SURROGATE/in/)
XML (RSS)
Full URL List:: Plain Text List (URLs only)
HTML (URLs with title)
Only Domain:: Plain Text List (domains only)
HTML (domains as URLs, no title)
Only Text:: Fulltext of Search Index Text

Export to file #[exportfile]# is running .. #[urlcount]# Documents so far

:: #(/lurlexport)# #(lurlexportfinished)#::

Finished export of #[urlcount]# Documents to file #[exportfile]#
Import this file by moving it to DATA/SURROGATES/in

:: #(/lurlexportfinished)# #(lurlexporterror)#::

Export to file #[exportfile]# failed: #[exportfailmsg]#

:: #(/lurlexporterror)# #(dumprestore)#:: :: #(/dumprestore)# #(indexdump)#::

Stored a solr dump to file #[dumpfile]#

Could not create the Solr dump : no embedded Solr is available.

An error occurred while trying to create the Solr dump.

#(/indexdump)# #(indexRestore)#::

Successfully restored Solr index from dump file!

Could not restore the Solr dump : no embedded Solr is available.

An error occurred while trying to restore the Solr dump.

#(/indexRestore)# #%env/templates/footer.template%#