From 655d8db80218312593f37b6026fd6dc0db01d23f Mon Sep 17 00:00:00 2001 From: Michael Peter Christen Date: Sun, 12 Nov 2023 15:26:18 +0100 Subject: [PATCH] detailed directions in index export to explain how the export can be imported again using elasticsearch/opensearch --- htroot/IndexExport_p.html | 51 +++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/htroot/IndexExport_p.html b/htroot/IndexExport_p.html index 1aa992716..eb3e1f188 100644 --- a/htroot/IndexExport_p.html +++ b/htroot/IndexExport_p.html @@ -9,11 +9,10 @@ #%env/templates/header.template%# #%env/templates/submenuIndexImport.template%# - - +

Index Export

The local index currently contains #[ucount]# documents, only #[ucount200]# exportable with status code 200 - the remaining are error documents.

- + #(lurlexport)#::
Loaded URL Export @@ -34,19 +33,45 @@
Full Data Records:
-
XML (Rich and full-text Solr data, one document per line in one large xml file, can be processed with shell tools, can be imported with DATA/SURROGATE/in/)
- JSON (Rich and full-text Elasticsearch data, one document per line in one flat JSON file, can be bulk-imported to elasticsearch with the command "curl -XPOST localhost:9200/collection1/yacy/_bulk --data-binary @yacy_dump_XXX.flatjson")
- XML (RSS)
+
+ JSON (Rich and full-text Elasticsearch data, one document per line in one flat JSON file, + can be bulk-imported to elasticsearch. Here is an example for opensearch, using docker:
+Start docker container of opensearch:
+docker run --name opensearch -p 9200:9200 -d -e OPENSEARCH_JAVA_OPTS="-Xms2G -Xmx2G" -e discovery.type=single-node -e DISABLE_SECURITY_PLUGIN=true -v $(pwd)/opensearch_data:/usr/share/opensearch/data opensearchproject/opensearch:latest
+Unblock index creation:
+curl -X PUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' +{ + "persistent": { + "cluster.blocks.create_index": null + } +}'
+Create the search index:
+curl -X PUT "http://localhost:9200/collection1/yacy"
+Bulk-upload the index file:
+curl -XPOST "http://localhost:9200/collection1/yacy/_bulk?filter_path=took,errors" -H "Content-Type: application/x-ndjson" --data-binary @yacy_dump_XXX.flatjson
+Make a search, get 10 results, search in fields text_t, title, description with boosts:
+curl -X POST "http://localhost:9200/collection1/yacy/_search" -H 'Content-Type: application/json' -d' +{"size": 10, "query": {"multi_match": { + "query": "one two three", + "fields": ["text_t", "title^10", "description^3"], "fuzziness": "AUTO" +}}}'
+ + XML (Rich and full-text Solr data, one document per line in one large xml file, + can be processed with shell tools, can be imported with DATA/SURROGATE/in/) +
+ + XML (RSS) +
Full URL List:
Plain Text List (URLs only)
HTML (URLs with title)
Only Domain:
Plain Text List (domains only)
HTML (domains as URLs, no title)
-
Only Text:
+
Only Text:
Fulltext of Search Index Text
-
-
+ +
 
@@ -55,16 +80,16 @@ ::
Export to file #[exportfile]# is running .. #[urlcount]# Documents so far
:: #(/lurlexport)# - - #(lurlexportfinished)#:: + + #(lurlexportfinished)#::
Finished export of #[urlcount]# Documents to file #[exportfile]#
Import this file by moving it to DATA/SURROGATES/in
:: #(/lurlexportfinished)# - + #(lurlexporterror)#::
Export to file #[exportfile]# failed: #[exportfailmsg]#
:: #(/lurlexporterror)# - + #(dumprestore)#::
Dump and Restore of Solr Index