detailed directions in index export to explain how the export can be

imported again using elasticsearch/opensearch
pull/612/head
Michael Peter Christen 1 year ago
parent 24011dcbcc
commit 655d8db802

@ -10,7 +10,6 @@
#%env/templates/header.template%#
#%env/templates/submenuIndexImport.template%#
<h2>Index Export</h2>
<p>The local index currently contains #[ucount]# documents, only #[ucount200]# exportable with status code 200 - the remaining are error documents.</p>
@ -34,9 +33,35 @@
<dd>
<dl>
<dt>Full Data Records:</dt>
<dd><input type="radio" name="format" value="full-solr" /> XML (Rich and full-text Solr data, one document per line in one large xml file, can be processed with shell tools, can be imported with DATA/SURROGATE/in/)<br />
<input type="radio" name="format" value="full-elasticsearch" checked="checked" /> JSON (Rich and full-text Elasticsearch data, one document per line in one flat JSON file, can be bulk-imported to elasticsearch with the command "curl -XPOST localhost:9200/collection1/yacy/_bulk --data-binary @yacy_dump_XXX.flatjson")<br />
<input type="radio" name="format" value="full-rss" /> XML (RSS)</dd>
<dd><input type="radio" name="format" value="full-elasticsearch" checked="checked" />
JSON (Rich and full-text Elasticsearch data, one document per line in one flat JSON file,
can be bulk-imported to elasticsearch. Here is an example for opensearch, using docker:<br />
Start docker container of opensearch:<br />
<code>docker run --name opensearch -p 9200:9200 -d -e OPENSEARCH_JAVA_OPTS="-Xms2G -Xmx2G" -e discovery.type=single-node -e DISABLE_SECURITY_PLUGIN=true -v $(pwd)/opensearch_data:/usr/share/opensearch/data opensearchproject/opensearch:latest</code><br />
Unblock index creation:<br />
<code>curl -X PUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.blocks.create_index": null
}
}'</code><br />
Create the search index:<br />
<code>curl -X PUT "http://localhost:9200/collection1/yacy"</code><br />
Bulk-upload the index file:<br />
<code>curl -XPOST "http://localhost:9200/collection1/yacy/_bulk?filter_path=took,errors" -H "Content-Type: application/x-ndjson" --data-binary @yacy_dump_XXX.flatjson</code><br />
Make a search, get 10 results, search in fields text_t, title, description with boosts:<br />
<code>curl -X POST "http://localhost:9200/collection1/yacy/_search" -H 'Content-Type: application/json' -d'
{"size": 10, "query": {"multi_match": {
"query": "one two three",
"fields": ["text_t", "title^10", "description^3"], "fuzziness": "AUTO"
}}}'</code><br />
<input type="radio" name="format" value="full-solr" />
XML (Rich and full-text Solr data, one document per line in one large xml file,
can be processed with shell tools, can be imported with DATA/SURROGATE/in/)
<br />
<input type="radio" name="format" value="full-rss" />
XML (RSS)
</dd>
<dt>Full URL List:</dt>
<dd><input type="radio" name="format" value="url-text" /> Plain Text List (URLs only)<br />
<input type="radio" name="format" value="url-html" /> HTML (URLs with title)</dd>

Loading…
Cancel
Save