removed the new index export method from the IndexControlURLs_p.html

servlet and moved it to a new /IndexExport_p.html servlet. This servlet
is now more prominent linked in the main menu under Production -> Index
Export/Import
pull/8/head
Michael Peter Christen 10 years ago
parent d0aff91f23
commit 5e2d23b7a0

@ -114,6 +114,28 @@ function updatepage(str) {
</fieldset>
</form>
#(dumprestore)#::
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Optimize Solr</legend>
<dl>
<dt>&nbsp;</dt>
<dd>merge to max. <input type="text" name="optimizemax" value="#[optimizemax]#" size="6" maxlength="6" /> segments
<input type="submit" name="optimizesolr" value="Optimize Solr" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
</fieldset>
</form>
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Reboot Solr Core</legend>
<dl>
<dt>&nbsp;</dt>
<dd><input type="submit" name="rebootsolr" value="Shut Down and Re-Start Solr" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
</fieldset>
</form>::
#(/dumprestore)#
#(statistics)#::
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Statistics about top-domains in URL Database</legend>
@ -153,92 +175,6 @@ function updatepage(str) {
</table>
#(/statisticslines)#
#(dumprestore)#::
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Dump and Restore of Solr Index</legend>
<dl>
<dt>&nbsp;</dt>
<dd><input type="submit" name="indexdump" value="Create Dump" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
<dl>
<dt class="TableCellDark">Dump File</dt>
<dd><input type="text" name="dumpfile" value="#[dumpfile]#" size="80" maxlength="250" />
</dd>
<dt>&nbsp;</dt>
<dd><input type="submit" name="indexrestore" value="Restore Dump" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
</fieldset>
</form>
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Optimize Solr</legend>
<dl>
<dt>&nbsp;</dt>
<dd>merge to max. <input type="text" name="optimizemax" value="#[optimizemax]#" size="6" maxlength="6" /> segments
<input type="submit" name="optimizesolr" value="Optimize Solr" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
</fieldset>
</form>
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Reboot Solr Core</legend>
<dl>
<dt>&nbsp;</dt>
<dd><input type="submit" name="rebootsolr" value="Shut Down and Re-Start Solr" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
</fieldset>
</form>::
#(/dumprestore)#
#(lurlexport)#::
<form action="IndexControlURLs_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<fieldset><legend>Loaded URL Export</legend>
<dl>
<dt class="TableCellDark">Export File</dt>
<dd><input type="text" name="exportfile" value="#[exportfile]#" size="80" maxlength="250" />
</dd>
<dt class="TableCellDark">URL Filter</dt>
<dd><input type="text" name="exportfilter" value=".*.*" size="20" maxlength="250" />
</dd>
<dt class="TableCellDark">query</dt>
<dd><input type="text" name="exportquery" value="*:*" size="20" maxlength="250" />
</dd>
<dt class="TableCellDark">Export Format</dt>
<dd>
<dl>
<dt>Only Domain:</dt>
<dd><input type="radio" name="format" value="dom-text" /> Plain Text List (domains only)<br />
<input type="radio" name="format" value="dom-html" /> HTML (domains as URLs, no title)</dd>
<dt>Full URL List:</dt>
<dd><input type="radio" name="format" value="url-text" /> Plain Text List (URLs only)<br />
<input type="radio" name="format" value="url-html" /> HTML (URLs with title)</dd>
<dt>Full Data Records:</dt>
<dd><input type="radio" name="format" value="full-rss" /> XML (RSS)<br />
<input type="radio" name="format" value="full-solr" checked="checked" /> XML (Rich and full Solr data using Solr Schema, can be imported with DATA/SURROGATE/in/)</dd>
</dl></dd>
<dt>&nbsp;</dt>
<dd><input type="submit" name="lurlexport" value="Export URLs" class="btn btn-primary" style="width:240px;"/>
</dd>
</dl>
</fieldset>
</form>::
<div class="commit" style="text-decoration:blink">Export to file #[exportfile]# is running .. #[urlcount]# URLs so far</div>::
#(/lurlexport)#
#(lurlexportfinished)#::
<div class="commit">Finished export of #[urlcount]# URLs to file <a href="file://#[exportfile]#" target="_">#[exportfile]#</a></div>::
#(/lurlexportfinished)#
#(lurlexporterror)#::
<div class="error">Export to file #[exportfile]# failed: #[exportfailmsg]#</div>::
#(/lurlexporterror)#
#(indexdump)#::
<div class="commit">Stored a solr dump to file #[dumpfile]#</div>::
#(/indexdump)#
#(genUrlProfile)#
::No entry found for URL-hash #[urlhash]#
::<iframe src="solr/select?defType=edismax&start=0&rows=3&core=collection1&wt=html&q=id:%22#[urlhash]#%22" width="100%" height="420" frameborder="0" scrolling="no"></iframe><br />

@ -1,14 +1,10 @@
// IndexControlRWIs_p.java
// IndexControlURLs_p.java
// -----------------------
// (C) 2004-2007 by Michael Peter Christen; mc@yacy.net, Frankfurt a. M., Germany
// first published 2004 on http://yacy.net
//
// This is a part of YaCy, a peer-to-peer based web search engine
//
// $LastChangedDate$
// $LastChangedRevision$
// $LastChangedBy$
//
// LICENSE
//
// This program is free software; you can redistribute it and/or modify
@ -36,7 +32,6 @@ import java.util.Set;
import org.apache.lucene.search.FieldCache;
import net.yacy.cora.date.GenericFormatter;
import net.yacy.cora.document.encoding.ASCII;
import net.yacy.cora.document.id.DigestURL;
import net.yacy.cora.federate.yacy.CacheStrategy;
@ -77,8 +72,6 @@ public class IndexControlURLs_p {
prop.put("statistics_lines", 100);
prop.put("statisticslines", 0);
prop.put("reload", 0);
prop.put("indexdump", 0);
prop.put("lurlexport", 0);
prop.put("reload", 0);
prop.put("dumprestore", 1);
List<File> dumpFiles = segment.fulltext().dumpFiles();
@ -89,38 +82,6 @@ public class IndexControlURLs_p {
prop.put("cleanuprwi", segment.termIndex() != null && !segment.termIndex().isEmpty() ? 1 : 0);
prop.put("cleanupcitation", segment.connectedCitation() && !segment.urlCitation().isEmpty() ? 1 : 0);
// show export messages
final Fulltext.Export export = segment.fulltext().export();
if ((export != null) && (export.isAlive())) {
// there is currently a running export
prop.put("lurlexport", 2);
prop.put("lurlexportfinished", 0);
prop.put("lurlexporterror", 0);
prop.put("lurlexport_exportfile", export.file().toString());
prop.put("lurlexport_urlcount", export.count());
prop.put("reload", 1);
} else {
prop.put("lurlexport", 1);
prop.put("lurlexport_exportfile", sb.getDataPath() + "/DATA/EXPORT/" + GenericFormatter.SHORT_SECOND_FORMATTER.format());
if (export == null) {
// there has never been an export
prop.put("lurlexportfinished", 0);
prop.put("lurlexporterror", 0);
} else {
// an export was running but has finished
prop.put("lurlexportfinished", 1);
prop.put("lurlexportfinished_exportfile", export.file().toString());
prop.put("lurlexportfinished_urlcount", export.count());
if (export.failed() == null) {
prop.put("lurlexporterror", 0);
} else {
prop.put("lurlexporterror", 1);
prop.put("lurlexporterror_exportfile", export.file().toString());
prop.put("lurlexporterror_exportfailmsg", export.failed());
}
}
}
if (post == null || env == null) {
prop.putNum("ucount", ucount);
return prop; // nothing to do
@ -247,52 +208,6 @@ public class IndexControlURLs_p {
prop.put("statistics", 0);
}
}
if (post.containsKey("lurlexport")) {
// parse format
int format = 0;
final String fname = post.get("format", "url-text");
final boolean dom = fname.startsWith("dom"); // if dom== false complete urls are exported, otherwise only the domain
if (fname.endsWith("text")) format = 0;
if (fname.endsWith("html")) format = 1;
if (fname.endsWith("rss")) format = 2;
if (fname.endsWith("solr")) format = 3;
// extend export file name
String s = post.get("exportfile", "");
if (s.indexOf('.',0) < 0) {
if (format == 0) s = s + ".txt";
if (format == 1) s = s + ".html";
if (format == 2 ) s = s + "_rss.xml";
if (format == 3) s = s + "_full.xml";
}
final File f = new File(s);
f.getParentFile().mkdirs();
final String filter = post.get("exportfilter", ".*");
final String query = post.get("exportquery", "*:*");
final Fulltext.Export running = segment.fulltext().export(f, filter, query, format, dom);
prop.put("lurlexport_exportfile", s);
prop.put("lurlexport_urlcount", running.count());
if ((running != null) && (running.failed() == null)) {
prop.put("lurlexport", 2);
}
prop.put("reload", 1);
}
if (post.containsKey("indexdump")) {
final File dump = segment.fulltext().dumpSolr();
prop.put("indexdump", 1);
prop.put("indexdump_dumpfile", dump.getAbsolutePath());
dumpFiles = segment.fulltext().dumpFiles();
prop.put("dumprestore_dumpfile", dumpFiles.size() == 0 ? "" : dumpFiles.get(dumpFiles.size() - 1).getAbsolutePath());
//sb.tables.recordAPICall(post, "IndexControlURLs_p.html", WorkTables.TABLE_API_TYPE_STEERING, "solr dump generation");
}
if (post.containsKey("indexrestore")) {
final File dump = new File(post.get("dumpfile", ""));
segment.fulltext().restoreSolr(dump);
}
if (post.containsKey("optimizesolr")) {
final int size = post.getInt("optimizemax", 10);

@ -94,7 +94,7 @@
<ul class="nav nav-sidebar menugroup">
<li><h3>Production</h3></li>
<li><a href="CrawlStartExpert.html" class="MenuItemLink">Advanced Crawler</a></li>
<li><a href="Load_RSS_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">Content Importer</a></li>
<li><a href="IndexExport_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">Index Export/Import</a></li>
<li><a href="Vocabulary_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">Content Semantic</a></li>
<li><a href="CrawlCheck_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">Target Analysis</a></li>
</ul>

@ -1,9 +1,15 @@
<div class="SubMenu">
<h3>Content Importer</h3>
<h3>Content Export / Import</h3>
</div>
<div class="SubMenu">
<div class="SubMenugroup">
<h3>External Datasets</h3>
<h3>Export</h3>
<ul class="SubMenu">
<li><a href="IndexExport_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">Internal Index Export</a></li>
</ul>
</div>
<div class="SubMenugroup">
<h3>Import</h3>
<ul class="SubMenu">
<li><a href="Load_RSS_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">RSS Feed Importer</a></li>
<li><a href="IndexImportOAIPMH_p.html" class="MenuItemLink #(authorized)#lock::unlock#(/authorized)#">OAI-PMH Importer</a></li>

@ -3882,7 +3882,7 @@ Crawler Monitor==Crawler Überwachung
### PRODUCTION ###
Production==Produktion
Advanced Crawler==Experten Crawl Start
Content Importer==Importfunktionen
Index Export/Import==Daten Export / Import
Target Analysis==Ziel Analyse
Process Scheduler==Prozess Planer
### ADMINISTRATION ###
@ -4058,7 +4058,7 @@ Network<br/>Scanner==Netzwerk<br/>Scanner
Crawling of==Crawlen von
#MediaWikis==MediaWikis
>phpBB3 Forums<==>phpBB3 Foren<
Content Import<==Content Importer<
Index Export/Import<==Content Importer<
Network Harvesting<==Netzwerk Harvesting<
#Remote<br/>Crawling==Remote<br/>Crawling
#Scraping<br/>Proxy==Scraping<br/>Proxy

@ -2149,7 +2149,7 @@ Web Visualization==Visualisation du web
Crawler Monitor==Surveillance du balayeur
### PRODUCTION ###
Advanced Crawler==Balayeur avanc&eacute;
Content Importer==Importer du contenu
Index Export/Import==Importer du contenu
Target Analysis==Analyse de cible
Process Scheduler==Planificateur de processus
### ADMINISTRATION ###

@ -4325,7 +4325,7 @@ RAM/Disk Usage &amp; Updates==Использование памяти и обн
System Status==Монитор производительности
Peer-to-Peer Network==Сеть YaCy
Advanced Crawler==Расширенная индексация
Content Importer==Импорт контента
Index Export/Import==Импорт контента
System Administration==Настройки системы
Configuration==Конфигурация
Production==Индексирование
@ -4378,7 +4378,7 @@ Outgoing&nbsp;Cookies==Исходящие куки
#File: env/templates/submenuIndexImport.template
#-----------------------------
Content Importer==Импорт контента
Index Export/Import==Импорт контента
External Datasets==Импорт из внешних источников
>Database Reader<==>Обозреватель баз данных<
RSS Feed Importer==Импорт RSS-лент

Loading…
Cancel
Save