added a new way of content browsing in search results:

- date navigation

The date is taken from the CONTENT of the documents / web pages, NOT
from a date submitted in the context of metadata (i.e. http header or
html head form). This makes it possible to search for documents in the
future, i.e. when documents contain event descriptions for future
events.

The date is written to an index field which is now enabled by default.
All documents are scanned for contained date mentions.
To visualize the dates for a specific search results, a histogram
showing the number of documents for each day is displayed. To render
these histograms the morris.js library is used. Morris.js requires also
raphael.js which is now also integrated in YaCy.

The histogram is now also displayed in the index browser by default.

To select a specific range from a search result, the following modifiers
had been introduced:
from:<date>
to:<date>
These modifiers can be used separately (i.e. only 'from' or only 'to')
to describe an open interval or combined to have a closed interval. Both
dates are inclusive. To select a specific single date only, use the
'to:' - modifier.

The histogram shows blue and green lines; the green lines denot weekend
days (saturday and sunday).

Clicking on bars in the histogram has the following reaction:
1st click: add a from:<date> modifier for the date of the bar
2nd click: add a to:<date> modifier for the date of the bar
3rd click: remove from and date modifier and set a on:<date> for the bar
When the on:<date> modifier is used, the histogram shows an unlimited
time period. This makes it possible to click again (4th click) which is
then interpreted as a 1st click again (sets a from modifier).

The display feature is NOT switched on by default; to switch it on use
the /ConfigSearchPage_p.html servlet.
pull/1/head
Michael Peter Christen 10 years ago
parent c3aadcf899
commit 535f1ebe3b

@ -18,17 +18,11 @@ sku
## last-modified from http header, date (mandatory field) ## last-modified from http header, date (mandatory field)
last_modified last_modified
## if date expressions can be found in the content, these dates are listed here in order of the appearances ## if date expressions can be found in the content, these dates are listed here as date objects in order of the appearances
#dates_in_content_sxt dates_in_content_dts
## the number of entries in dates_in_content_sxt ## the number of entries in dates_in_content_sxt
#dates_in_content_count_i dates_in_content_count_i
## if dates_in_content_sxt is filled, this contains the oldest date from the list of available dates
#date_in_content_min_dt
## if dates_in_content_sxt is filled, this contains the youngest date from the list of available dates, that may also be possibly in the future
#date_in_content_max_dt
## mime-type of document, string (mandatory field) ## mime-type of document, string (mandatory field)
content_type content_type

@ -837,7 +837,7 @@ search.result.show.vocabulary.omit =
# can be temporary different if search string is given with differen navigation values # can be temporary different if search string is given with differen navigation values
# assigning no value(s) means that no navigation is shown # assigning no value(s) means that no navigation is shown
search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language
#search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language,collections #search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language,collections,date
# search result verification and snippet fetch caching rules # search result verification and snippet fetch caching rules
# each search result can be verified byloading the link from the web # each search result can be verified byloading the link from the web

@ -174,9 +174,46 @@
</div> </div>
</fieldset> </fieldset>
</div> </div>
<!-- the search result --> <!-- the search result -->
<div style="float: left;"> <div style="float: left;">
<input type="checkbox" name="search.navigation.date" value="true" #(search.navigation.date)#::checked="checked" #(/search.navigation.date)# /> Date Navigation
<link rel="stylesheet" href="/env/morris.css">
<script src="/js/raphael-min.js"></script>
<script src="/js/morris.js"></script>
<div id="graph" style="height:200px"></div>
<script>
var solr= $.getJSON("http://localhost:8090/solr/collection1/select?q=*:*&defType=edismax&start=0&rows=0&wt=json&facet=true&facet.field=dates_in_content_dts&facet.sort=index", function(data) {
dates_in_content_dts = data.facet_counts.facet_fields.dates_in_content_dts;
var parsed = [];
for (var i = 0; i < dates_in_content_dts.length; i = i + 2) {
var date = dates_in_content_dts[i];
var count = dates_in_content_dts[i + 1];
if (date && count) {parsed[parsed.length] = {x: date,y: count};};
};
if (parsed.length > 0) Morris.Bar({
element: 'graph',
data: parsed,
xkey: 'x',
ykeys: ['y'],
labels: ['number of documents about this date'],
yLabelFormat: function (y) { return y.toString() + ' docs'; },
barColors: function (row, series, type) {
var d = new Date(row.label);
if (d.getDay() === 6) return '#4aaf46'; //saturday
if (d.getDay() === 0) return '#4aaf46'; //sunday
return '#3574c0';
},
hideHover: 'false'
}).on('click', function(i, row) {
console.log(i, row);
});
});
</script>
<fieldset> <fieldset>
<div class="searchresults"> <div class="searchresults">
<h4 class="linktitle"> <h4 class="linktitle">

@ -88,6 +88,7 @@ public class ConfigSearchPage_p {
if (post.getBoolean("search.navigation.collections")) nav += "collections,"; if (post.getBoolean("search.navigation.collections")) nav += "collections,";
if (post.getBoolean("search.navigation.namespace")) nav += "namespace,"; if (post.getBoolean("search.navigation.namespace")) nav += "namespace,";
if (post.getBoolean("search.navigation.topics")) nav += "topics,"; if (post.getBoolean("search.navigation.topics")) nav += "topics,";
if (post.getBoolean("search.navigation.date")) nav += "date,";
if (nav.endsWith(",")) nav = nav.substring(0, nav.length() - 1); if (nav.endsWith(",")) nav = nav.substring(0, nav.length() - 1);
sb.setConfig("search.navigation", nav); sb.setConfig("search.navigation", nav);
} }
@ -166,6 +167,7 @@ public class ConfigSearchPage_p {
prop.put("search.navigation.collections", sb.getConfig("search.navigation", "").indexOf("collections",0) >= 0 ? 1 : 0); prop.put("search.navigation.collections", sb.getConfig("search.navigation", "").indexOf("collections",0) >= 0 ? 1 : 0);
prop.put("search.navigation.namespace", sb.getConfig("search.navigation", "").indexOf("namespace",0) >= 0 ? 1 : 0); prop.put("search.navigation.namespace", sb.getConfig("search.navigation", "").indexOf("namespace",0) >= 0 ? 1 : 0);
prop.put("search.navigation.topics", sb.getConfig("search.navigation", "").indexOf("topics",0) >= 0 ? 1 : 0); prop.put("search.navigation.topics", sb.getConfig("search.navigation", "").indexOf("topics",0) >= 0 ? 1 : 0);
prop.put("search.navigation.date", sb.getConfig("search.navigation", "").indexOf("date",0) >= 0 ? 1 : 0);
prop.put("about.headline", sb.getConfig("about.headline", "About")); prop.put("about.headline", sb.getConfig("about.headline", "About"));
prop.put("about.body", sb.getConfig("about.body", "")); prop.put("about.body", sb.getConfig("about.body", ""));

@ -32,8 +32,6 @@ import net.yacy.cora.lod.vocabulary.Tagging;
import net.yacy.cora.protocol.ClientIdentification; import net.yacy.cora.protocol.ClientIdentification;
import net.yacy.cora.protocol.RequestHeader; import net.yacy.cora.protocol.RequestHeader;
import net.yacy.cora.util.Html2Image; import net.yacy.cora.util.Html2Image;
import net.yacy.cora.util.JSONException;
import net.yacy.cora.util.JSONObject;
import net.yacy.crawler.data.CrawlProfile; import net.yacy.crawler.data.CrawlProfile;
import net.yacy.document.LibraryProvider; import net.yacy.document.LibraryProvider;
import net.yacy.search.Switchboard; import net.yacy.search.Switchboard;

@ -113,6 +113,39 @@ function updatepage(str) {
<div class="error" style="float:left;">&nbsp;&nbsp;&nbsp;Load Errors</div> <div class="error" style="float:left;">&nbsp;&nbsp;&nbsp;Load Errors</div>
</div> </div>
</fieldset> </fieldset>
<link rel="stylesheet" href="/env/morris.css">
<script src="/js/raphael-min.js"></script>
<script src="/js/morris.js"></script>
<div id="graph" style="height:200px"></div>
<script>
var solr= $.getJSON("http://localhost:8090/solr/collection1/select?q=*:*&defType=edismax&start=0&rows=0&wt=json&facet=true&facet.field=dates_in_content_dts&facet.sort=index", function(data) {
dates_in_content_dts = data.facet_counts.facet_fields.dates_in_content_dts;
var parsed = [];
for (var i = 0; i < dates_in_content_dts.length; i = i + 2) {
var date = dates_in_content_dts[i];
var count = dates_in_content_dts[i + 1];
if (date && count) {parsed[parsed.length] = {x: date,y: count};};
};
if (parsed.length > 0) Morris.Bar({
element: 'graph',
data: parsed,
xkey: 'x',
ykeys: ['y'],
labels: ['number of documents about this date'],
yLabelFormat: function (y) { return y.toString() + ' docs'; },
barColors: function (row, series, type) {
var d = new Date(row.label);
if (d.getDay() === 6) return '#4aaf46'; //saturday
if (d.getDay() === 0) return '#4aaf46'; //sunday
return '#3574c0';
},
hideHover: 'false'
}).on('click', function(i, row) {
console.log(i, row);
});
});
</script>
#(/hosts)# #(/hosts)#
#(hostanalysis)#:: #(hostanalysis)#::

@ -204,16 +204,16 @@ public class HostBrowser {
// collect hosts from index // collect hosts from index
ReversibleScoreMap<String> hostscore = fulltext.getDefaultConnector().getFacets(AbstractSolrConnector.CATCHALL_QUERY, maxcount, CollectionSchema.host_s.getSolrFieldName()).get(CollectionSchema.host_s.getSolrFieldName()); ReversibleScoreMap<String> hostscore = fulltext.getDefaultConnector().getFacets(AbstractSolrConnector.CATCHALL_QUERY, maxcount, CollectionSchema.host_s.getSolrFieldName()).get(CollectionSchema.host_s.getSolrFieldName());
if (hostscore == null) hostscore = new ClusteredScoreMap<String>(); if (hostscore == null) hostscore = new ClusteredScoreMap<String>(true);
// collect hosts from crawler // collect hosts from crawler
final Map<String, Integer[]> crawler = (authorized) ? sb.crawlQueues.noticeURL.getDomainStackHosts(StackType.LOCAL, sb.robots) : new HashMap<String, Integer[]>(); final Map<String, Integer[]> crawler = (authorized) ? sb.crawlQueues.noticeURL.getDomainStackHosts(StackType.LOCAL, sb.robots) : new HashMap<String, Integer[]>();
// collect the errorurls // collect the errorurls
Map<String, ReversibleScoreMap<String>> exclfacets = authorized ? fulltext.getDefaultConnector().getFacets(CollectionSchema.failtype_s.getSolrFieldName() + ":" + FailType.excl.name(), maxcount, CollectionSchema.host_s.getSolrFieldName()) : null; Map<String, ReversibleScoreMap<String>> exclfacets = authorized ? fulltext.getDefaultConnector().getFacets(CollectionSchema.failtype_s.getSolrFieldName() + ":" + FailType.excl.name(), maxcount, CollectionSchema.host_s.getSolrFieldName()) : null;
ReversibleScoreMap<String> exclscore = exclfacets == null ? new ClusteredScoreMap<String>() : exclfacets.get(CollectionSchema.host_s.getSolrFieldName()); ReversibleScoreMap<String> exclscore = exclfacets == null ? new ClusteredScoreMap<String>(true) : exclfacets.get(CollectionSchema.host_s.getSolrFieldName());
Map<String, ReversibleScoreMap<String>> failfacets = authorized ? fulltext.getDefaultConnector().getFacets(CollectionSchema.failtype_s.getSolrFieldName() + ":" + FailType.fail.name(), maxcount, CollectionSchema.host_s.getSolrFieldName()) : null; Map<String, ReversibleScoreMap<String>> failfacets = authorized ? fulltext.getDefaultConnector().getFacets(CollectionSchema.failtype_s.getSolrFieldName() + ":" + FailType.fail.name(), maxcount, CollectionSchema.host_s.getSolrFieldName()) : null;
ReversibleScoreMap<String> failscore = failfacets == null ? new ClusteredScoreMap<String>() : failfacets.get(CollectionSchema.host_s.getSolrFieldName()); ReversibleScoreMap<String> failscore = failfacets == null ? new ClusteredScoreMap<String>(true) : failfacets.get(CollectionSchema.host_s.getSolrFieldName());
int c = 0; int c = 0;
Iterator<String> i = hostscore.keys(false); Iterator<String> i = hostscore.keys(false);

@ -158,7 +158,7 @@ public class WebStructurePicture_p {
final double radius = 1.0 / (1 << nextlayer); final double radius = 1.0 / (1 << nextlayer);
final WebStructureGraph.StructureEntry sr = structure.outgoingReferences(pivotnode.getKey()); final WebStructureGraph.StructureEntry sr = structure.outgoingReferences(pivotnode.getKey());
final Map<String, Integer> next = (sr == null) ? new HashMap<String, Integer>() : sr.references; final Map<String, Integer> next = (sr == null) ? new HashMap<String, Integer>() : sr.references;
ClusteredScoreMap<String> next0 = new ClusteredScoreMap<String>(); ClusteredScoreMap<String> next0 = new ClusteredScoreMap<String>(false);
for (Map.Entry<String, Integer> entry: next.entrySet()) next0.set(entry.getKey(), entry.getValue()); for (Map.Entry<String, Integer> entry: next.entrySet()) next0.set(entry.getKey(), entry.getValue());
// first set points to next hosts // first set points to next hosts
final List<Map.Entry<String, String>> targets = new ArrayList<Map.Entry<String, String>>(); final List<Map.Entry<String, String>> targets = new ArrayList<Map.Entry<String, String>>();

@ -0,0 +1,2 @@
.morris-hover{position:absolute;z-index:1000}.morris-hover.morris-default-style{border-radius:10px;padding:6px;color:#666;background:rgba(255,255,255,0.8);border:solid 2px rgba(230,230,230,0.8);font-family:sans-serif;font-size:12px;text-align:center}.morris-hover.morris-default-style .morris-hover-row-label{font-weight:bold;margin:0.25em 0}
.morris-hover.morris-default-style .morris-hover-point{white-space:nowrap;margin:0.1em 0}

@ -58,7 +58,7 @@
<div class="input-group"> <div class="input-group">
<input name="query" id="search" type="text" size="40" maxlength="80" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" class="form-control searchinput typeahead" /> <input name="query" id="search" type="text" size="40" maxlength="80" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" class="form-control searchinput typeahead" />
<div class="input-group-btn"> <div class="input-group-btn">
<button type="submit" id="Enter" class="btn btn-primary">Search</button> <button id="Enter" name="Enter" class="btn btn-primary" type="submit">Search</button>
</div> </div>
</div> </div>
<input type="hidden" name="verify" value="#[search.verify]#" /> <input type="hidden" name="verify" value="#[search.verify]#" />

@ -28,7 +28,6 @@
// javac -classpath .:../classes index.java // javac -classpath .:../classes index.java
// if the shell's current path is HTROOT // if the shell's current path is HTROOT
import java.io.IOException;
import net.yacy.cora.document.analysis.Classification; import net.yacy.cora.document.analysis.Classification;
import net.yacy.cora.document.analysis.Classification.ContentDomain; import net.yacy.cora.document.analysis.Classification.ContentDomain;
import net.yacy.cora.protocol.RequestHeader; import net.yacy.cora.protocol.RequestHeader;

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -92,9 +92,9 @@ Use the RSS search result format to add static searches to your RSS reader, if y
<form class="search small" name="searchform" action="" method="get" accept-charset="UTF-8" style="position:fixed;top:8px;z-index:1052;max-width:500px;"> <form class="search small" name="searchform" action="" method="get" accept-charset="UTF-8" style="position:fixed;top:8px;z-index:1052;max-width:500px;">
<div class="input-group"> <div class="input-group">
<input type="text" class="form-control searchinput typeahead" size="40" maxlength="200" placeholder="#[promoteSearchPageGreeting]#" name="query" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" id="search" onclick="document.getElementById('Enter').innerHTML = 'search'"/> <input name="query" id="search" type="text" class="form-control searchinput typeahead" size="40" maxlength="200" placeholder="#[promoteSearchPageGreeting]#" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" onclick="document.getElementById('Enter').innerHTML = 'search'"/>
<div class="input-group-btn"> <div class="input-group-btn">
<button id="Enter" class="btn btn-default" type="submit">search</button> <button id="Enter" name="Enter" class="btn btn-default" type="submit">search</button>
</div> </div>
</div> </div>
<input type="hidden" name="contentdom" id="contentdom" value="#[contentdom]#" /> <input type="hidden" name="contentdom" id="contentdom" value="#[contentdom]#" />
@ -175,6 +175,9 @@ document.getElementById("Enter").innerHTML = "search again";
</div> </div>
#(/geoinfo)# #(/geoinfo)#
<!-- show date histogram if date navigation is active -->
<div id="datehistogram"></div>
<!-- linklist begin --> <!-- linklist begin -->
#(resultTable)#::<table width="100%"><tr class="TableHeader"><td width="30%">Media</td><td width="70%">URL</td></tr>#(/resultTable)# #(resultTable)#::<table width="100%"><tr class="TableHeader"><td width="30%">Media</td><td width="70%">URL</td></tr>#(/resultTable)#
#{results}# #{results}#

@ -91,6 +91,58 @@ show search results for "#[query]#" on map</a>
</ul> </ul>
#(/cat-location)# #(/cat-location)#
#(nav-dates)#::
<link rel="stylesheet" href="/env/morris.css">
<script src="/js/raphael-min.js"></script>
<script src="/js/morris.js"></script>
<script>
document.getElementById("datehistogram").style = "height:200px";
dates_in_content_dts = [#{element}#"#[name]#","#[count]#"#(nl)#::,#(/nl)##{/element}#];
var parsed = [];
for (var i = 0; i < dates_in_content_dts.length; i = i + 2) {
var date = dates_in_content_dts[i];
var count = dates_in_content_dts[i + 1];
if (date && count) {parsed[parsed.length] = {x: date,y: count};};
};
if (parsed.length > 0) Morris.Bar({
element: 'datehistogram',
data: parsed,
xkey: 'x',
ykeys: ['y'],
labels: ['number of documents about this date'],
yLabelFormat: function (y) { return y.toString() + ' docs'; },
barColors: function (row, series, type) {
var d = new Date(row.label);
if (d.getDay() === 6) return '#4aaf46'; //saturday
if (d.getDay() === 0) return '#4aaf46'; //sunday
return '#3574c0';
},
hideHover: 'false'
}).on('click', function(i, row) {
var query = document.getElementsByClassName('searchinput')[0].getAttribute("value");
var onp = -1, fromp = -1, top = -1;
if ((onp = query.indexOf("on:")) >= 0) {
query = query.substring(0, onp - 1);
}
if ((fromp = query.indexOf("from:")) < 0) {
query = query + " from:" + row.x;
document.getElementsByClassName('searchinput')[0].value = query;
document.getElementById('Enter').click();
} else if ((top = query.indexOf("to:")) < 0) {
query = query + " to:" + row.x;
document.getElementsByClassName('searchinput')[0].value = query;
document.getElementById('Enter').click();
} else {
query = query.substring(0, fromp) + " on:" + row.x;
document.getElementsByClassName('searchinput')[0].value = query;
document.getElementById('Enter').click();
}
var date = row.x;
console.log(i, row, query);
});
</script>
#(/nav-dates)#
#(nav-domains)#:: #(nav-domains)#::
<ul class="nav nav-sidebar menugroup"> <ul class="nav nav-sidebar menugroup">
<li><h3>Provider</h3></li> <li><h3>Provider</h3></li>

@ -1,13 +1,8 @@
// yacysearchitem.java
// (C) 2007 by Michael Peter Christen; mc@yacy.net, Frankfurt a. M., Germany // (C) 2007 by Michael Peter Christen; mc@yacy.net, Frankfurt a. M., Germany
// first published 28.08.2007 on http://yacy.net // first published 28.08.2007 on http://yacy.net
// //
// This is a part of YaCy, a peer-to-peer based web search engine // This is a part of YaCy, a peer-to-peer based web search engine
// //
// $LastChangedDate$
// $LastChangedRevision$
// $LastChangedBy$
//
// LICENSE // LICENSE
// //
// This program is free software; you can redistribute it and/or modify // This program is free software; you can redistribute it and/or modify
@ -24,17 +19,22 @@
// along with this program; if not, write to the Free Software // along with this program; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
import java.text.ParseException;
import java.util.AbstractMap; import java.util.AbstractMap;
import java.util.Date;
import java.util.Iterator; import java.util.Iterator;
import java.util.LinkedList; import java.util.LinkedList;
import java.util.Map; import java.util.Map;
import org.apache.solr.schema.TrieDateField;
import net.yacy.cora.document.analysis.Classification; import net.yacy.cora.document.analysis.Classification;
import net.yacy.cora.document.analysis.Classification.ContentDomain; import net.yacy.cora.document.analysis.Classification.ContentDomain;
import net.yacy.cora.document.id.MultiProtocolURL; import net.yacy.cora.document.id.MultiProtocolURL;
import net.yacy.cora.lod.vocabulary.Tagging; import net.yacy.cora.lod.vocabulary.Tagging;
import net.yacy.cora.protocol.RequestHeader; import net.yacy.cora.protocol.RequestHeader;
import net.yacy.cora.sorting.ScoreMap; import net.yacy.cora.sorting.ScoreMap;
import net.yacy.document.DateDetection;
import net.yacy.document.LibraryProvider; import net.yacy.document.LibraryProvider;
import net.yacy.kelondro.util.ISO639; import net.yacy.kelondro.util.ISO639;
import net.yacy.peers.graphics.ProfilingGraph; import net.yacy.peers.graphics.ProfilingGraph;
@ -55,6 +55,7 @@ public class yacysearchtrailer {
private static final int TOPWORDS_MINSIZE = 8; private static final int TOPWORDS_MINSIZE = 8;
private static final int TOPWORDS_MAXSIZE = 22; private static final int TOPWORDS_MAXSIZE = 22;
@SuppressWarnings({ "deprecation", "static-access" })
public static serverObjects respond(final RequestHeader header, final serverObjects post, final serverSwitch env) { public static serverObjects respond(final RequestHeader header, final serverObjects post, final serverSwitch env) {
final serverObjects prop = new serverObjects(); final serverObjects prop = new serverObjects();
final Switchboard sb = (Switchboard) env; final Switchboard sb = (Switchboard) env;
@ -105,12 +106,10 @@ public class yacysearchtrailer {
navigatorIterator = theSearch.namespaceNavigator.keys(false); navigatorIterator = theSearch.namespaceNavigator.keys(false);
int i = 0, pos = 0, neg = 0; int i = 0, pos = 0, neg = 0;
String nav; String nav;
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next(); name = navigatorIterator.next();
count = theSearch.namespaceNavigator.get(name); count = theSearch.namespaceNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
nav = "inurl%3A" + name; nav = "inurl%3A" + name;
if (!theSearch.query.modifier.toString().contains("inurl:"+name)) { if (!theSearch.query.modifier.toString().contains("inurl:"+name)) {
pos++; pos++;
@ -131,10 +130,7 @@ public class yacysearchtrailer {
prop.put("nav-namespace_element", i); prop.put("nav-namespace_element", i);
i--; i--;
prop.put("nav-namespace_element_" + i + "_nl", 0); prop.put("nav-namespace_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) prop.put("nav-namespace", 0); // this navigation is not useful
{
prop.put("nav-namespace", 0); // this navigation is not useful
}
} }
// host navigators // host navigators
@ -146,12 +142,10 @@ public class yacysearchtrailer {
navigatorIterator = hostNavigator.keys(false); navigatorIterator = hostNavigator.keys(false);
int i = 0, pos = 0, neg = 0; int i = 0, pos = 0, neg = 0;
String nav; String nav;
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next(); name = navigatorIterator.next();
count = hostNavigator.get(name); count = hostNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
nav = "site%3A" + name; nav = "site%3A" + name;
if (theSearch.query.modifier.sitehost == null || !theSearch.query.modifier.sitehost.contains(name)) { if (theSearch.query.modifier.sitehost == null || !theSearch.query.modifier.sitehost.contains(name)) {
pos++; pos++;
@ -172,10 +166,7 @@ public class yacysearchtrailer {
prop.put("nav-domains_element", i); prop.put("nav-domains_element", i);
i--; i--;
prop.put("nav-domains_element_" + i + "_nl", 0); prop.put("nav-domains_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) prop.put("nav-domains", 0); // this navigation is not useful
{
prop.put("nav-domains", 0); // this navigation is not useful
}
} }
// language navigators // language navigators
@ -187,12 +178,10 @@ public class yacysearchtrailer {
navigatorIterator = languageNavigator.keys(false); navigatorIterator = languageNavigator.keys(false);
int i = 0, pos = 0, neg = 0; int i = 0, pos = 0, neg = 0;
String nav; String nav;
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next(); name = navigatorIterator.next();
count = languageNavigator.get(name); count = languageNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
nav = "%2Flanguage%2F" + name; nav = "%2Flanguage%2F" + name;
if (theSearch.query.modifier.language == null || !theSearch.query.modifier.language.contains(name)) { if (theSearch.query.modifier.language == null || !theSearch.query.modifier.language.contains(name)) {
pos++; pos++;
@ -214,10 +203,7 @@ public class yacysearchtrailer {
prop.put("nav-languages_element", i); prop.put("nav-languages_element", i);
i--; i--;
prop.put("nav-languages_element_" + i + "_nl", 0); prop.put("nav-languages_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) prop.put("nav-languages", 0); // this navigation is not useful
{
prop.put("nav-languages", 0); // this navigation is not useful
}
} }
// author navigators // author navigators
@ -228,12 +214,10 @@ public class yacysearchtrailer {
navigatorIterator = theSearch.authorNavigator.keys(false); navigatorIterator = theSearch.authorNavigator.keys(false);
int i = 0, pos = 0, neg = 0; int i = 0, pos = 0, neg = 0;
String nav; String nav;
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim(); name = navigatorIterator.next().trim();
count = theSearch.authorNavigator.get(name); count = theSearch.authorNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
nav = (name.indexOf(' ', 0) < 0) ? "author%3A" + name : "author%3A%28" + name.replace(" ", "+") + "%29"; nav = (name.indexOf(' ', 0) < 0) ? "author%3A" + name : "author%3A%28" + name.replace(" ", "+") + "%29";
if (theSearch.query.modifier.author == null || !theSearch.query.modifier.author.contains(name)) { if (theSearch.query.modifier.author == null || !theSearch.query.modifier.author.contains(name)) {
pos++; pos++;
@ -254,8 +238,7 @@ public class yacysearchtrailer {
prop.put("nav-authors_element", i); prop.put("nav-authors_element", i);
i--; i--;
prop.put("nav-authors_element_" + i + "_nl", 0); prop.put("nav-authors_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) {
{
prop.put("nav-authors", 0); // this navigation is not useful prop.put("nav-authors", 0); // this navigation is not useful
} }
} }
@ -268,12 +251,10 @@ public class yacysearchtrailer {
navigatorIterator = theSearch.collectionNavigator.keys(false); navigatorIterator = theSearch.collectionNavigator.keys(false);
int i = 0, pos = 0, neg = 0; int i = 0, pos = 0, neg = 0;
String nav; String nav;
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim(); name = navigatorIterator.next().trim();
count = theSearch.collectionNavigator.get(name); count = theSearch.collectionNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
nav = (name.indexOf(' ', 0) < 0) ? "collection%3A" + name : "collection%3A%28" + name.replace(" ", "+") + "%29"; nav = (name.indexOf(' ', 0) < 0) ? "collection%3A" + name : "collection%3A%28" + name.replace(" ", "+") + "%29";
if (theSearch.query.modifier.collection == null || !theSearch.query.modifier.collection.contains(name)) { if (theSearch.query.modifier.collection == null || !theSearch.query.modifier.collection.contains(name)) {
pos++; pos++;
@ -294,10 +275,7 @@ public class yacysearchtrailer {
prop.put("nav-collections_element", i); prop.put("nav-collections_element", i);
i--; i--;
prop.put("nav-collections_element_" + i + "_nl", 0); prop.put("nav-collections_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) prop.put("nav-collections", 0); // this navigation is not useful
{
prop.put("nav-collections", 0); // this navigation is not useful
}
} }
// topics navigator // topics navigator
@ -360,12 +338,10 @@ public class yacysearchtrailer {
if (oldProtocolModifier != null && oldProtocolModifier.length() > 0) {theSearch.query.modifier.remove("/" + oldProtocolModifier); theSearch.query.modifier.remove(oldProtocolModifier);} if (oldProtocolModifier != null && oldProtocolModifier.length() > 0) {theSearch.query.modifier.remove("/" + oldProtocolModifier); theSearch.query.modifier.remove(oldProtocolModifier);}
theSearch.query.modifier.protocol = ""; theSearch.query.modifier.protocol = "";
theSearch.query.getQueryGoal().query_original = oldQuery.replaceAll(" /https", "").replaceAll(" /http", "").replaceAll(" /ftp", "").replaceAll(" /smb", "").replaceAll(" /file", ""); theSearch.query.getQueryGoal().query_original = oldQuery.replaceAll(" /https", "").replaceAll(" /http", "").replaceAll(" /ftp", "").replaceAll(" /smb", "").replaceAll(" /file", "");
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim(); name = navigatorIterator.next().trim();
count = theSearch.protocolNavigator.get(name); count = theSearch.protocolNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
visible = visible || "ftp,smb".indexOf(name) >= 0; visible = visible || "ftp,smb".indexOf(name) >= 0;
nav = "%2F" + name; nav = "%2F" + name;
if (oldProtocolModifier == null || !oldProtocolModifier.equals(name)) { if (oldProtocolModifier == null || !oldProtocolModifier.equals(name)) {
@ -391,10 +367,60 @@ public class yacysearchtrailer {
prop.put("nav-protocols_element", i); prop.put("nav-protocols_element", i);
i--; i--;
prop.put("nav-protocols_element_" + i + "_nl", 0); prop.put("nav-protocols_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) prop.put("nav-protocols", 0); // this navigation is not useful
{ }
prop.put("nav-protocols", 0); // this navigation is not useful
// date navigators
if (theSearch.dateNavigator == null || theSearch.dateNavigator.isEmpty()) {
prop.put("nav-dates", 0);
} else {
prop.put("nav-dates", 1);
navigatorIterator = theSearch.dateNavigator.iterator(); // this iterator is different as it iterates by the key order (which is a date order)
int i = 0, pos = 0, neg = 0;
long dx = -1;
long dayms = 1000L * 60L * 60L * 24L;
Date fromconstraint = theSearch.getQuery().modifier.from == null ? null : DateDetection.parseLine(theSearch.getQuery().modifier.from);
if (fromconstraint == null) fromconstraint = new Date(System.currentTimeMillis() - 365 * dayms);
Date toconstraint = theSearch.getQuery().modifier.to == null ? null : DateDetection.parseLine(theSearch.getQuery().modifier.to);
if (toconstraint == null) toconstraint = new Date(System.currentTimeMillis() + 365 * dayms);
while (i < QueryParams.FACETS_DATE_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim();
if (name.length() < 10) continue;
count = theSearch.dateNavigator.get(name);
String shortname = name.substring(0, 10);
long d;
Date dd;
try {dd = TrieDateField.parseDate(name); d = dd.getTime();} catch (ParseException e) {continue;}
if (fromconstraint != null && dd.before(fromconstraint)) continue;
if (toconstraint != null && dd.after(toconstraint)) break;
if (dx > 0) {
while (d - dx > dayms) {
dx += dayms;
String sn = TrieDateField.formatExternal(new Date(dx)).substring(0, 10);
prop.put("nav-dates_element_" + i + "_on", 0);
prop.put(fileType, "nav-dates_element_" + i + "_name", sn);
prop.put("nav-dates_element_" + i + "_count", 0);
prop.put("nav-dates_element_" + i + "_nl", 1);
i++;
}
}
dx = d;
if (theSearch.query.modifier.on == null || !theSearch.query.modifier.on.contains(shortname) ) {
pos++;
prop.put("nav-dates_element_" + i + "_on", 1);
} else {
neg++;
prop.put("nav-dates_element_" + i + "_on", 0);
}
prop.put(fileType, "nav-dates_element_" + i + "_name", shortname);
prop.put("nav-dates_element_" + i + "_count", count);
prop.put("nav-dates_element_" + i + "_nl", 1);
i++;
} }
prop.put("nav-dates_element", i);
i--;
prop.put("nav-dates_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) prop.put("nav-dates", 0); // this navigation is not useful
} }
// filetype navigators // filetype navigators
@ -406,12 +432,10 @@ public class yacysearchtrailer {
int i = 0, pos = 0, neg = 0; int i = 0, pos = 0, neg = 0;
String nav; String nav;
boolean visible = false; boolean visible = false;
while (i < 10 && navigatorIterator.hasNext()) { while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim(); name = navigatorIterator.next().trim();
count = theSearch.filetypeNavigator.get(name); count = theSearch.filetypeNavigator.get(name);
if (count == 0) { if (count == 0) break;
break;
}
visible = visible || Classification.isMediaExtension(name) || "pdf,doc,docx".indexOf(name) >= 0; visible = visible || Classification.isMediaExtension(name) || "pdf,doc,docx".indexOf(name) >= 0;
nav = "filetype%3A" + name; nav = "filetype%3A" + name;
if (theSearch.query.modifier.filetype == null || !theSearch.query.modifier.filetype.contains(name) ) { if (theSearch.query.modifier.filetype == null || !theSearch.query.modifier.filetype.contains(name) ) {
@ -433,10 +457,7 @@ public class yacysearchtrailer {
prop.put("nav-filetypes_element", i); prop.put("nav-filetypes_element", i);
i--; i--;
prop.put("nav-filetypes_element_" + i + "_nl", 0); prop.put("nav-filetypes_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) if (pos == 1 && neg == 0) prop.put("nav-filetypes", 0); // this navigation is not useful
{
prop.put("nav-filetypes", 0); // this navigation is not useful
}
} }
// vocabulary navigators // vocabulary navigators
@ -455,9 +476,7 @@ public class yacysearchtrailer {
while (i < 20 && navigatorIterator.hasNext()) { while (i < 20 && navigatorIterator.hasNext()) {
name = navigatorIterator.next(); name = navigatorIterator.next();
count = ve.getValue().get(name); count = ve.getValue().get(name);
if (count == 0) { if (count == 0) break;
break;
}
nav = "%2Fvocabulary%2F" + navname + "%2F" + MultiProtocolURL.escape(Tagging.encodePrintname(name)).toString(); nav = "%2Fvocabulary%2F" + navname + "%2F" + MultiProtocolURL.escape(Tagging.encodePrintname(name)).toString();
if (!theSearch.query.modifier.toString().contains("/vocabulary/" + navname + "/" + name.replace(' ', '_'))) { if (!theSearch.query.modifier.toString().contains("/vocabulary/" + navname + "/" + name.replace(' ', '_'))) {
prop.put("nav-vocabulary_" + navvoccount + "_element_" + i + "_on", 1); prop.put("nav-vocabulary_" + navvoccount + "_element_" + i + "_on", 1);
@ -511,10 +530,6 @@ public class yacysearchtrailer {
return prop; return prop;
} }
private static final boolean on(final int pos, final int neg, final int maxlimit) {
return neg > 0 || (pos > 1 && pos <= maxlimit);
}
} }
//http://localhost:8090/yacysearch.html?query=java+&amp;maximumRecords=10&amp;resource=local&amp;verify=cacheonly&amp;nav=hosts,authors,namespace,topics,filetype,protocol&amp;urlmaskfilter=ftp://.*&amp;prefermaskfilter=&amp;constraint=&amp;contentdom=text&amp;former=java+%2Fftp&amp;startRecord=0 //http://localhost:8090/yacysearch.html?query=java+&amp;maximumRecords=10&amp;resource=local&amp;verify=cacheonly&amp;nav=hosts,authors,namespace,topics,filetype,protocol&amp;urlmaskfilter=ftp://.*&amp;prefermaskfilter=&amp;constraint=&amp;contentdom=text&amp;former=java+%2Fftp&amp;startRecord=0
//http://localhost:8090/yacysearch.html?query=java+&amp;maximumRecords=10&amp;resource=local&amp;verify=cacheonly&amp;nav=hosts,authors,namespace,topics,filetype,protocol&amp;urlmaskfilter=.*&amp;prefermaskfilter=&amp;constraint=&amp;contentdom=text&amp;former=java+%2Fvocabulary%2FGewerke%2FTore&amp;startRecord=0 //http://localhost:8090/yacysearch.html?query=java+&amp;maximumRecords=10&amp;resource=local&amp;verify=cacheonly&amp;nav=hosts,authors,namespace,topics,filetype,protocol&amp;urlmaskfilter=.*&amp;prefermaskfilter=&amp;constraint=&amp;contentdom=text&amp;former=java+%2Fvocabulary%2FGewerke%2FTore&amp;startRecord=0

@ -1,6 +1,5 @@
"navigation": [#(nav-filetypes)#:: "navigation": {#(nav-dates)#::
{ "dates": {
"facetname": "filetypes",
"displayname": "Filetype", "displayname": "Filetype",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -9,11 +8,22 @@
"elements": [ "elements": [
#{element}# #{element}#
{"name": "#[name]#", "count": "#[count]#", "modifier": "#[modifier]#", "url": "#[url]#"}#(nl)#::,#(/nl)# {"name": "#[name]#", "count": "#[count]#", "modifier": "#[modifier]#", "url": "#[url]#"}#(nl)#::,#(/nl)#
#{/element}#
]
},#(/nav-dates)##(nav-filetypes)#::
"filetypes": {
"displayname": "Filetype",
"type": "String",
"min": "0",
"max": "0",
"mean": "0",
"elements": [
#{element}#
{"name": "#[name]#", "count": "#[count]#"}#(nl)#::,#(/nl)#
#{/element}# #{/element}#
] ]
},#(/nav-filetypes)##(nav-protocols)#:: },#(/nav-filetypes)##(nav-protocols)#::
{ "protocols": {
"facetname": "protocols",
"displayname": "Protocol", "displayname": "Protocol",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -25,8 +35,7 @@
#{/element}# #{/element}#
] ]
},#(/nav-protocols)##(nav-domains)#:: },#(/nav-protocols)##(nav-domains)#::
{ "domains": {
"facetname": "domains",
"displayname": "Domains", "displayname": "Domains",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -38,8 +47,7 @@
#{/element}# #{/element}#
] ]
},#(/nav-domains)##(nav-namespace)#:: },#(/nav-domains)##(nav-namespace)#::
{ "namespace": {
"facetname": "namespace",
"displayname": "Name Space", "displayname": "Name Space",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -51,8 +59,7 @@
#{/element}# #{/element}#
] ]
},#(/nav-namespace)##(nav-authors)#:: },#(/nav-namespace)##(nav-authors)#::
{ "authors": {
"facetname": "authors",
"displayname": "Authors", "displayname": "Authors",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -64,8 +71,7 @@
#{/element}# #{/element}#
] ]
},#(/nav-authors)##(nav-collections)#:: },#(/nav-authors)##(nav-collections)#::
{ "collections": {
"facetname": "collections",
"displayname": "Collections", "displayname": "Collections",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -77,8 +83,7 @@
#{/element}# #{/element}#
] ]
},#(/nav-collections)##{nav-vocabulary}# },#(/nav-collections)##{nav-vocabulary}#
{ "#[navname]#": {
"facetname": "#[navname]#",
"displayname": "#[navname]#", "displayname": "#[navname]#",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -90,8 +95,7 @@
#{/element}# #{/element}#
] ]
},#{/nav-vocabulary}##(nav-topics)#{}:: },#{/nav-vocabulary}##(nav-topics)#{}::
{ "topics": {
"facetname": "topics",
"displayname": "Topics", "displayname": "Topics",
"type": "String", "type": "String",
"min": "0", "min": "0",
@ -103,5 +107,5 @@
#{/element}# #{/element}#
] ]
}#(/nav-topics)# }#(/nav-topics)#
], },
"totalResults": "#[num-results_totalcount]#" "totalResults": "#[num-results_totalcount]#"

@ -23,9 +23,7 @@ import java.io.IOException;
import java.net.MalformedURLException; import java.net.MalformedURLException;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Arrays; import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.List; import java.util.List;
import net.yacy.cora.document.encoding.ASCII;
import net.yacy.cora.document.encoding.UTF8; import net.yacy.cora.document.encoding.UTF8;
import net.yacy.cora.document.feed.RSSFeed; import net.yacy.cora.document.feed.RSSFeed;
import net.yacy.cora.document.feed.RSSMessage; import net.yacy.cora.document.feed.RSSMessage;
@ -41,7 +39,6 @@ import net.yacy.document.TextParser;
import net.yacy.kelondro.data.meta.URIMetadataNode; import net.yacy.kelondro.data.meta.URIMetadataNode;
import net.yacy.search.query.QueryParams; import net.yacy.search.query.QueryParams;
import net.yacy.search.schema.CollectionSchema; import net.yacy.search.schema.CollectionSchema;
import org.apache.http.entity.mime.content.ContentBody;
/** /**
* Handling of queries to remote OpenSearch systems. Iterates to a list of * Handling of queries to remote OpenSearch systems. Iterates to a list of

@ -111,6 +111,11 @@ public class SchemaConfiguration extends Configuration implements Serializable {
if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.getTime() > 0))) key.add(doc, value); if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.getTime() > 0))) key.add(doc, value);
} }
public void add(final SolrInputDocument doc, final SchemaDeclaration key, final Date[] value) {
assert key.isMultiValued() : "key = " + key.getSolrFieldName();
if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.length > 0))) key.add(doc, value);
}
public void add(final SolrInputDocument doc, final SchemaDeclaration key, final String[] value) { public void add(final SolrInputDocument doc, final SchemaDeclaration key, final String[] value) {
assert key.isMultiValued() : "key = " + key.getSolrFieldName(); assert key.isMultiValued() : "key = " + key.getSolrFieldName();
if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.length > 0))) key.add(doc, value); if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.length > 0))) key.add(doc, value);

@ -59,6 +59,8 @@ public interface SchemaDeclaration {
public void add(final SolrInputDocument doc, final long value); public void add(final SolrInputDocument doc, final long value);
public void add(final SolrInputDocument doc, final Date[] value);
public void add(final SolrInputDocument doc, final String[] value); public void add(final SolrInputDocument doc, final String[] value);
public void add(final SolrInputDocument doc, final Integer[] value); public void add(final SolrInputDocument doc, final Integer[] value);

@ -48,6 +48,7 @@ import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.IndexSchema; import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField; import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.TextField; import org.apache.solr.schema.TextField;
import org.apache.solr.schema.TrieDateField;
import org.apache.solr.search.DocIterator; import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList; import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher; import org.apache.solr.search.SolrIndexSearcher;
@ -257,6 +258,7 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
writer.write(lb); writer.write(lb);
} }
@SuppressWarnings({ "static-access", "deprecation" })
private static void writeField(final Writer writer, final String typeName, final String name, final String value) throws IOException { private static void writeField(final Writer writer, final String typeName, final String name, final String value) throws IOException {
if (typeName.equals(SolrType.text_general.printName()) || if (typeName.equals(SolrType.text_general.printName()) ||
typeName.equals(SolrType.string.printName()) || typeName.equals(SolrType.string.printName()) ||
@ -269,14 +271,15 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
} else if (typeName.equals(SolrType.num_long.printName())) { } else if (typeName.equals(SolrType.num_long.printName())) {
writeTag(writer, "long", name, value, true); writeTag(writer, "long", name, value, true);
} else if (typeName.equals(SolrType.date.printName())) { } else if (typeName.equals(SolrType.date.printName())) {
writeTag(writer, "date", name, org.apache.solr.schema.TrieDateField.formatExternal(new Date(Long.parseLong(value))), true); writeTag(writer, "date", name, TrieDateField.formatExternal(new Date(Long.parseLong(value))), true);
} else if (typeName.equals(SolrType.num_float.printName())) { } else if (typeName.equals(SolrType.num_float.printName())) {
writeTag(writer, "float", name, value, true); writeTag(writer, "float", name, value, true);
} else if (typeName.equals(SolrType.num_double.printName())) { } else if (typeName.equals(SolrType.num_double.printName())) {
writeTag(writer, "double", name, value, true); writeTag(writer, "double", name, value, true);
} }
} }
@SuppressWarnings({ "static-access", "deprecation" })
private static void writeField(final Writer writer, final String name, final Object value) throws IOException { private static void writeField(final Writer writer, final String name, final Object value) throws IOException {
if (value instanceof String) { if (value instanceof String) {
writeTag(writer, "str", name, (String) value, true); writeTag(writer, "str", name, (String) value, true);
@ -287,7 +290,7 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
} else if (value instanceof Long) { } else if (value instanceof Long) {
writeTag(writer, "long", name, ((Long) value).toString(), true); writeTag(writer, "long", name, ((Long) value).toString(), true);
} else if (value instanceof Date) { } else if (value instanceof Date) {
writeTag(writer, "date", name, org.apache.solr.schema.TrieDateField.formatExternal((Date) value), true); writeTag(writer, "date", name, TrieDateField.formatExternal((Date) value), true);
} else if (value instanceof Float) { } else if (value instanceof Float) {
writeTag(writer, "float", name, ((Float) value).toString(), true); writeTag(writer, "float", name, ((Float) value).toString(), true);
} else if (value instanceof Double) { } else if (value instanceof Double) {

@ -45,6 +45,7 @@ import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.IndexSchema; import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField; import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.TextField; import org.apache.solr.schema.TextField;
import org.apache.solr.schema.TrieDateField;
import org.apache.solr.search.DocIterator; import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList; import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher; import org.apache.solr.search.SolrIndexSearcher;
@ -217,12 +218,13 @@ public class HTMLResponseWriter implements QueryResponseWriter {
return kv; return kv;
} }
@SuppressWarnings({ "static-access", "deprecation" })
private static String field2string(final FieldType type, final String value) { private static String field2string(final FieldType type, final String value) {
String typeName = type.getTypeName(); String typeName = type.getTypeName();
if (typeName.equals(SolrType.bool.printName())) { if (typeName.equals(SolrType.bool.printName())) {
return "F".equals(value) ? "false" : "true"; return "F".equals(value) ? "false" : "true";
} else if (typeName.equals(SolrType.date.printName())) { } else if (typeName.equals(SolrType.date.printName())) {
return org.apache.solr.schema.TrieDateField.formatExternal(new Date(Long.parseLong(value))); return TrieDateField.formatExternal(new Date(Long.parseLong(value)));
} }
return value; return value;
} }

@ -26,6 +26,7 @@ package net.yacy.cora.sorting;
import java.util.Comparator; import java.util.Comparator;
import java.util.Iterator; import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.Map; import java.util.Map;
import java.util.Random; import java.util.Random;
import java.util.SortedMap; import java.util.SortedMap;
@ -40,8 +41,12 @@ public final class ClusteredScoreMap<E> extends AbstractScoreMap<E> implements R
private long gcount; private long gcount;
private int encnt; private int encnt;
public ClusteredScoreMap() { /**
this.map = new TreeMap<E, Long>(); * create a sorted map where there is a choice between a hash map or a tree map for the key store
* @param sortedKeys if true, a tree map is used for key storage; in this case the iterator() returns a sorted list of keys; if sortedKey is set to false, a linked hash map is used which preserves the original key appearance order
*/
public ClusteredScoreMap(boolean sortedKeys) {
this.map = sortedKeys ? new TreeMap<E, Long>() : new LinkedHashMap<E, Long>();
this.pam = new TreeMap<Long, E>(); this.pam = new TreeMap<Long, E>();
this.gcount = 0; this.gcount = 0;
this.encnt = 0; this.encnt = 0;
@ -345,7 +350,7 @@ public final class ClusteredScoreMap<E> extends AbstractScoreMap<E> implements R
public static void main(final String[] args) { public static void main(final String[] args) {
System.out.println("Test for Score: start"); System.out.println("Test for Score: start");
final ClusteredScoreMap<String> s = new ClusteredScoreMap<String>(); final ClusteredScoreMap<String> s = new ClusteredScoreMap<String>(false);
long c = 0; long c = 0;
// create cluster // create cluster

@ -25,6 +25,7 @@ import java.io.IOException;
import java.lang.reflect.Array; import java.lang.reflect.Array;
import java.net.MalformedURLException; import java.net.MalformedURLException;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.ConcurrentModificationException;
import java.util.Iterator; import java.util.Iterator;
import java.util.List; import java.util.List;
import java.util.Map; import java.util.Map;
@ -347,8 +348,13 @@ public class HostQueue implements Balancer {
@Override @Override
public boolean has(final byte[] urlhashb) { public boolean has(final byte[] urlhashb) {
for (Index depthStack: this.depthStacks.values()) { for (int retry = 0; retry < 3; retry++) {
if (depthStack.has(urlhashb)) return true; try {
for (Index depthStack: this.depthStacks.values()) {
if (depthStack.has(urlhashb)) return true;
}
return false;
} catch (ConcurrentModificationException e) {}
} }
return false; return false;
} }

@ -75,7 +75,7 @@ public final class ResultURLs {
static { static {
for (final EventOrigin origin: EventOrigin.values()) { for (final EventOrigin origin: EventOrigin.values()) {
resultStacks.put(origin, new LinkedHashMap<String, InitExecEntry>()); resultStacks.put(origin, new LinkedHashMap<String, InitExecEntry>());
resultDomains.put(origin, new ClusteredScoreMap<String>()); resultDomains.put(origin, new ClusteredScoreMap<String>(true));
} }
} }

@ -1,7 +1,6 @@
package net.yacy.data; package net.yacy.data;
import java.io.IOException; import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection; import java.util.Collection;
import java.util.Collections; import java.util.Collections;
import java.util.Comparator; import java.util.Comparator;

@ -361,11 +361,11 @@ public final class Condenser {
//Collection<Tagging> vocabularies = LibraryProvider.autotagging.getVocabularies(); //Collection<Tagging> vocabularies = LibraryProvider.autotagging.getVocabularies();
//assert vocabularyNames.size() == vocabularies.size(); //assert vocabularyNames.size() == vocabularies.size();
Map<String, String> vocMap = scraper == null ? null : scraper.removeVocMap(root); Map<String, String> vocMap = scraper == null ? null : scraper.removeVocMap(root);
if (vocMap != null) { if (vocMap != null && vocMap.size() > 0) {
for (Map.Entry<String, String> entry: vocMap.entrySet()) { for (Map.Entry<String, String> entry: vocMap.entrySet()) {
String navigatorName = entry.getKey(); String navigatorName = entry.getKey();
String term = entry.getValue(); String term = entry.getValue();
vocabularyNames.remove(navigatorName); vocabularyNames.remove(navigatorName); // prevent that this is used again for auto-annotation
Tagging vocabulary = LibraryProvider.autotagging.getVocabulary(navigatorName); Tagging vocabulary = LibraryProvider.autotagging.getVocabulary(navigatorName);
if (vocabulary != null) { if (vocabulary != null) {
// extend the vocabulary // extend the vocabulary

@ -500,7 +500,7 @@ public class DateDetection {
public static Date parseLine(String text) { public static Date parseLine(String text) {
Date d = null; Date d = null;
try {d = CONFORM.parse(text);} catch (ParseException e) {} try {d = CONFORM.parse(text);} catch (ParseException e) {}
if (d == null) try {d = GenericFormatter.FORMAT_SHORT_DAY.parse(text);} catch (ParseException e) {} //if (d == null) try {d = GenericFormatter.FORMAT_SHORT_DAY.parse(text);} catch (ParseException e) {} // did not work well and fired for wrong formats; do not use
if (d == null) try {d = GenericFormatter.FORMAT_RFC1123_SHORT.parse(text);} catch (ParseException e) {} if (d == null) try {d = GenericFormatter.FORMAT_RFC1123_SHORT.parse(text);} catch (ParseException e) {}
if (d == null) try {d = GenericFormatter.FORMAT_ANSIC.parse(text);} catch (ParseException e) {} if (d == null) try {d = GenericFormatter.FORMAT_ANSIC.parse(text);} catch (ParseException e) {}

@ -235,9 +235,9 @@ public class ContentScraper extends AbstractScraper implements Scraper {
this.titles = new LinkedHashSet<String>(); this.titles = new LinkedHashSet<String>();
this.headlines = (List<String>[]) Array.newInstance(ArrayList.class, 6); this.headlines = (List<String>[]) Array.newInstance(ArrayList.class, 6);
for (int i = 0; i < this.headlines.length; i++) this.headlines[i] = new ArrayList<String>(); for (int i = 0; i < this.headlines.length; i++) this.headlines[i] = new ArrayList<String>();
this.bold = new ClusteredScoreMap<String>(); this.bold = new ClusteredScoreMap<String>(false);
this.italic = new ClusteredScoreMap<String>(); this.italic = new ClusteredScoreMap<String>(false);
this.underline = new ClusteredScoreMap<String>(); this.underline = new ClusteredScoreMap<String>(false);
this.li = new ArrayList<String>(); this.li = new ArrayList<String>();
this.content = new CharBuffer(MAX_DOCSIZE, 1024); this.content = new CharBuffer(MAX_DOCSIZE, 1024);
this.htmlFilterEventListeners = new EventListenerList(); this.htmlFilterEventListeners = new EventListenerList();

@ -154,7 +154,7 @@ public class Evaluation {
* @return a list of subject names that match with the element * @return a list of subject names that match with the element
*/ */
public ClusteredScoreMap<String> match(final Element element, final CharSequence content) { public ClusteredScoreMap<String> match(final Element element, final CharSequence content) {
final ClusteredScoreMap<String> subjects = new ClusteredScoreMap<String>(); final ClusteredScoreMap<String> subjects = new ClusteredScoreMap<String>(false);
final List<Attribute> patterns = this.elementMatcher.get(element); final List<Attribute> patterns = this.elementMatcher.get(element);
if (patterns == null) return subjects; if (patterns == null) return subjects;
for (final Attribute attribute: patterns) { for (final Attribute attribute: patterns) {
@ -224,7 +224,7 @@ public class Evaluation {
newScores = pattern.match(element, content); newScores = pattern.match(element, content);
oldScores = getScores(pattern.getName()); oldScores = getScores(pattern.getName());
if (oldScores == null) { if (oldScores == null) {
oldScores = new ClusteredScoreMap<String>(); oldScores = new ClusteredScoreMap<String>(false);
this.modelMap.put(pattern.getName(), oldScores); this.modelMap.put(pattern.getName(), oldScores);
} }
oldScores.inc(newScores); oldScores.inc(newScores);

@ -1344,7 +1344,7 @@ public class Seed implements Cloneable, Comparable<Seed>, Comparator<Seed>
} }
public static void main(final String[] args) { public static void main(final String[] args) {
final ScoreMap<Integer> s = new ClusteredScoreMap<Integer>(); final ScoreMap<Integer> s = new ClusteredScoreMap<Integer>(true);
for ( int i = 0; i < 10000; i++ ) { for ( int i = 0; i < 10000; i++ ) {
final byte[] b = randomHash(); final byte[] b = randomHash();
s.inc(0xff & Base64Order.enhancedCoder.decodeByte(b[0])); s.inc(0xff & Base64Order.enhancedCoder.decodeByte(b[0]));

@ -2753,7 +2753,8 @@ public final class Switchboard extends serverSwitch {
new Condenser( new Condenser(
in.documents[i], in.queueEntry.profile().scraper(), in.queueEntry.profile().indexText(), in.documents[i], in.queueEntry.profile().scraper(), in.queueEntry.profile().indexText(),
in.queueEntry.profile().indexMedia(), in.queueEntry.profile().indexMedia(),
LibraryProvider.dymLib, true, this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_sxt)); LibraryProvider.dymLib, true,
this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_dts));
// update image result list statistics // update image result list statistics
// its good to do this concurrently here, because it needs a DNS lookup // its good to do this concurrently here, because it needs a DNS lookup
@ -3190,8 +3191,8 @@ public final class Switchboard extends serverSwitch {
throw new Parser.Failure("indexing is denied", url); throw new Parser.Failure("indexing is denied", url);
} }
final Condenser condenser = new Condenser( final Condenser condenser = new Condenser(
document, null, true, true, LibraryProvider.dymLib, true, document, null, true, true, LibraryProvider.dymLib, true,
Switchboard.this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_sxt)); Switchboard.this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_dts));
ResultImages.registerImages(url, document, true); ResultImages.registerImages(url, document, true);
Switchboard.this.webStructure.generateCitationReference(url, document); Switchboard.this.webStructure.generateCitationReference(url, document);
storeDocumentIndex( storeDocumentIndex(

@ -26,6 +26,7 @@ import java.util.Date;
import org.apache.solr.common.params.CommonParams; import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.MultiMapSolrParams; import org.apache.solr.common.params.MultiMapSolrParams;
import org.apache.solr.schema.TrieDateField;
import net.yacy.cora.document.id.DigestURL; import net.yacy.cora.document.id.DigestURL;
import net.yacy.cora.util.CommonPattern; import net.yacy.cora.util.CommonPattern;
@ -39,7 +40,7 @@ import net.yacy.server.serverObjects;
public class QueryModifier { public class QueryModifier {
private final StringBuilder modifier; private final StringBuilder modifier;
public String sitehost, sitehash, filetype, protocol, language, author, collection, on; public String sitehost, sitehash, filetype, protocol, language, author, collection, on, from, to;
public QueryModifier() { public QueryModifier() {
this.sitehash = null; this.sitehash = null;
@ -50,6 +51,8 @@ public class QueryModifier {
this.author = null; this.author = null;
this.collection = null; this.collection = null;
this.on = null; this.on = null;
this.from = null;
this.to = null;
this.modifier = new StringBuilder(20); this.modifier = new StringBuilder(20);
} }
@ -90,7 +93,7 @@ public class QueryModifier {
// parse site // parse site
final int sp = querystring.indexOf("site:", 0); final int sp = querystring.indexOf("site:", 0);
if ( sp >= 0 ) { if (sp >= 0) {
int ftb = querystring.indexOf(' ', sp); int ftb = querystring.indexOf(' ', sp);
if ( ftb == -1 ) { if ( ftb == -1 ) {
ftb = querystring.length(); ftb = querystring.length();
@ -114,13 +117,12 @@ public class QueryModifier {
// parse author // parse author
final int authori = querystring.indexOf("author:", 0); final int authori = querystring.indexOf("author:", 0);
if ( authori >= 0 ) { if (authori >= 0) {
// check if the author was given with single quotes or without // check if the author was given with single quotes or without
final boolean quotes = (querystring.charAt(authori + 7) == '('); final boolean quotes = (querystring.charAt(authori + 7) == '(');
if ( quotes ) { if ( quotes ) {
int ftb = querystring.indexOf(')', authori + 8); int ftb = querystring.indexOf(')', authori + 8);
if (ftb == -1) ftb = querystring.length() + 1; this.author = querystring.substring(authori + 8, ftb == -1 ? querystring.length() : ftb);
this.author = querystring.substring(authori + 8, ftb);
querystring = querystring.replace("author:(" + this.author + ")", ""); querystring = querystring.replace("author:(" + this.author + ")", "");
add("author:(" + author + ")"); add("author:(" + author + ")");
} else { } else {
@ -129,34 +131,46 @@ public class QueryModifier {
ftb = querystring.length(); ftb = querystring.length();
} }
this.author = querystring.substring(authori + 7, ftb); this.author = querystring.substring(authori + 7, ftb);
querystring = querystring.replace("author:" + this.author, ""); querystring = querystring.replace("author:" + this.author, "").replace(" ", " ").trim();
add("author:" + author); add("author:" + author);
} }
} }
// parse collection // parse collection
final int collectioni = querystring.indexOf("collection:", 0); final int collectioni = querystring.indexOf("collection:", 0);
if ( collectioni >= 0 ) { if (collectioni >= 0) {
int ftb = querystring.indexOf(' ', collectioni); int ftb = querystring.indexOf(' ', collectioni);
if ( ftb == -1 ) { this.collection = querystring.substring(collectioni + 11, ftb == -1 ? querystring.length() : ftb);
ftb = querystring.length(); querystring = querystring.replace("collection:" + this.collection, "").replace(" ", " ").trim();
}
this.collection = querystring.substring(collectioni + 11, ftb);
querystring = querystring.replace("collection:" + this.collection, "");
add("collection:" + this.collection); add("collection:" + this.collection);
} }
// parse on-date // parse on-date
final int oni = querystring.indexOf("on:", 0); final int oni = querystring.indexOf("on:", 0);
if ( oni >= 0 ) { if (oni >= 0) {
int ftb = querystring.indexOf(' ', oni); int ftb = querystring.indexOf(' ', oni);
if ( ftb == -1 ) { this.on = querystring.substring(oni + 3, ftb == -1 ? querystring.length() : ftb);
ftb = querystring.length(); querystring = querystring.replace("on:" + this.on, "").replace(" ", " ").trim();
}
this.on = querystring.substring(oni + 3, ftb);
querystring = querystring.replace("on:" + this.on, "");
add("on:" + this.on); add("on:" + this.on);
} }
// parse from-date
final int fromi = querystring.indexOf("from:", 0);
if (fromi >= 0) {
int ftb = querystring.indexOf(' ', fromi);
this.from = querystring.substring(fromi + 5, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("from:" + this.from, "").replace(" ", " ").trim();
add("from:" + this.from);
}
// parse to-date
final int toi = querystring.indexOf("to:", 0);
if (toi >= 0) {
int ftb = querystring.indexOf(' ', toi);
this.to = querystring.substring(toi + 3, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("to:" + this.to, "").replace(" ", " ").trim();
add("to:" + this.to);
}
// parse language // parse language
final int langi = querystring.indexOf("/language/"); final int langi = querystring.indexOf("/language/");
@ -255,8 +269,22 @@ public class QueryModifier {
fq.append(" AND ").append(QueryModifier.parseCollectionExpression(this.collection)); fq.append(" AND ").append(QueryModifier.parseCollectionExpression(this.collection));
} }
if (this.on != null && this.on.length() > 0 && fq.indexOf(CollectionSchema.dates_in_content_sxt.getSolrFieldName()) < 0) { if (fq.indexOf(CollectionSchema.dates_in_content_dts.getSolrFieldName()) < 0) {
fq.append(" AND ").append(QueryModifier.parseOnExpression(this.on)); if (this.on != null && this.on.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseOnExpression(this.on));
}
if (this.from != null && this.from.length() > 0 && (this.to == null || this.to.equals("*"))) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.from, null));
}
if ((this.from == null || this.from.equals("*")) && this.to != null && this.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(null, this.to));
}
if (this.from != null && this.from.length() > 0 && this.to != null && this.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.from, this.to));
}
} }
if (this.protocol != null && this.protocol.length() > 0 && fq.indexOf(CollectionSchema.url_protocol_s.getSolrFieldName()) < 0) { if (this.protocol != null && this.protocol.length() > 0 && fq.indexOf(CollectionSchema.url_protocol_s.getSolrFieldName()) < 0) {
@ -317,13 +345,29 @@ public class QueryModifier {
} }
public static String parseOnExpression(String onDescription) { public static String parseOnExpression(String onDescription) {
assert onDescription != null;
Date onDate = DateDetection.parseLine(onDescription); Date onDate = DateDetection.parseLine(onDescription);
StringBuilder filterQuery = new StringBuilder(20); StringBuilder filterQuery = new StringBuilder(20);
if (onDate != null) { if (onDate != null) {
filterQuery.append(CollectionSchema.dates_in_content_sxt.getSolrFieldName()).append(":\"").append(org.apache.solr.schema.TrieDateField.formatExternal(onDate)).append('\"'); @SuppressWarnings({ "deprecation", "static-access" })
String dstr = TrieDateField.formatExternal(onDate);
filterQuery.append(CollectionSchema.dates_in_content_dts.getSolrFieldName()).append(":[").append(dstr).append(" TO ").append(dstr).append(']');
}
return filterQuery.toString();
}
public static String parseFromToExpression(String from, String to) {
Date fromDate = from == null || from.equals("*") ? null : DateDetection.parseLine(from);
Date toDate = to == null || to.equals("*") ? null : DateDetection.parseLine(to);
StringBuilder filterQuery = new StringBuilder(20);
if (fromDate != null && toDate != null) {
@SuppressWarnings({ "deprecation", "static-access" })
String dstrFrom = fromDate == null ? "*" : TrieDateField.formatExternal(fromDate);
@SuppressWarnings({ "deprecation", "static-access" })
String dstrTo = toDate == null ? "*" : TrieDateField.formatExternal(toDate);
filterQuery.append(CollectionSchema.dates_in_content_dts.getSolrFieldName()).append(":[").append(dstrFrom).append(" TO ").append(dstrTo).append(']');
} }
return filterQuery.toString(); return filterQuery.toString();
} }
} }

@ -27,6 +27,7 @@
package net.yacy.search.query; package net.yacy.search.query;
import java.util.Collection; import java.util.Collection;
import java.util.Date;
import java.util.HashMap; import java.util.HashMap;
import java.util.Iterator; import java.util.Iterator;
import java.util.LinkedHashSet; import java.util.LinkedHashSet;
@ -70,9 +71,13 @@ import org.apache.solr.client.solrj.SolrQuery.SortClause;
import org.apache.solr.common.params.CommonParams; import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.DisMaxParams; import org.apache.solr.common.params.DisMaxParams;
import org.apache.solr.common.params.FacetParams; import org.apache.solr.common.params.FacetParams;
import org.apache.solr.schema.TrieDateField;
public final class QueryParams { public final class QueryParams {
public static int FACETS_STANDARD_MAXCOUNT = 30;
public static int FACETS_DATE_MAXCOUNT = 730;
public enum Searchdom { public enum Searchdom {
LOCAL, CLUSTER, GLOBAL; LOCAL, CLUSTER, GLOBAL;
@ -92,13 +97,13 @@ public final class QueryParams {
defaultfacetfields.put("hosts", CollectionSchema.host_s); defaultfacetfields.put("hosts", CollectionSchema.host_s);
defaultfacetfields.put("protocol", CollectionSchema.url_protocol_s); defaultfacetfields.put("protocol", CollectionSchema.url_protocol_s);
defaultfacetfields.put("filetype", CollectionSchema.url_file_ext_s); defaultfacetfields.put("filetype", CollectionSchema.url_file_ext_s);
defaultfacetfields.put("date", CollectionSchema.dates_in_content_dts);
defaultfacetfields.put("authors", CollectionSchema.author_sxt); defaultfacetfields.put("authors", CollectionSchema.author_sxt);
defaultfacetfields.put("collections", CollectionSchema.collection_sxt); defaultfacetfields.put("collections", CollectionSchema.collection_sxt);
defaultfacetfields.put("language", CollectionSchema.language_s); defaultfacetfields.put("language", CollectionSchema.language_s);
//missing: namespace //missing: namespace
} }
private static final int defaultmaxfacets = 30;
public static final Bitfield empty_constraint = new Bitfield(4, "AAAAAA"); public static final Bitfield empty_constraint = new Bitfield(4, "AAAAAA");
public static final Pattern catchall_pattern = Pattern.compile(".*"); public static final Pattern catchall_pattern = Pattern.compile(".*");
private static final Pattern matchnothing_pattern = Pattern.compile(""); private static final Pattern matchnothing_pattern = Pattern.compile("");
@ -137,7 +142,6 @@ public final class QueryParams {
protected boolean filterfailurls, filterscannerfail; protected boolean filterfailurls, filterscannerfail;
protected double lat, lon, radius; protected double lat, lon, radius;
public LinkedHashSet<String> facetfields; public LinkedHashSet<String> facetfields;
public int maxfacets;
private SolrQuery cachedQuery; private SolrQuery cachedQuery;
private CollectionConfiguration solrSchema; private CollectionConfiguration solrSchema;
@ -252,7 +256,6 @@ public final class QueryParams {
this.facetfields.add(CollectionSchema.VOCABULARY_PREFIX + v.getName() + CollectionSchema.VOCABULARY_TERMS_SUFFIX); this.facetfields.add(CollectionSchema.VOCABULARY_PREFIX + v.getName() + CollectionSchema.VOCABULARY_TERMS_SUFFIX);
} }
} }
this.maxfacets = defaultmaxfacets;
this.cachedQuery = null; this.cachedQuery = null;
} }
@ -443,10 +446,24 @@ public final class QueryParams {
if (getFacets && this.facetfields.size() > 0) { if (getFacets && this.facetfields.size() > 0) {
params.setFacet(true); params.setFacet(true);
params.setFacetMinCount(1); params.setFacetMinCount(1);
params.setFacetLimit(this.maxfacets); params.setFacetLimit(FACETS_STANDARD_MAXCOUNT);
params.setFacetSort(FacetParams.FACET_SORT_COUNT); params.setFacetSort(FacetParams.FACET_SORT_COUNT);
params.setParam(FacetParams.FACET_METHOD, FacetParams.FACET_METHOD_fcs); params.setParam(FacetParams.FACET_METHOD, FacetParams.FACET_METHOD_fcs);
for (String field: this.facetfields) params.addFacetField("{!ex=" + field + "}" + field); for (String field: this.facetfields) params.addFacetField(field); // params.addFacetField("{!ex=" + field + "}" + field);
if (this.facetfields.contains(CollectionSchema.dates_in_content_dts.name())) {
params.setParam("facet.range", CollectionSchema.dates_in_content_dts.name());
@SuppressWarnings({ "static-access", "deprecation" })
String start = TrieDateField.formatExternal(new Date(System.currentTimeMillis() - 1000L * 60L * 60L * 24L * 3));
@SuppressWarnings({ "static-access", "deprecation" })
String end = TrieDateField.formatExternal(new Date(System.currentTimeMillis() + 1000L * 60L * 60L * 24L * 3));
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.range.start", start);
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.range.end", end);
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.range.gap", "+1DAY");
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.sort", "index");
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.limit", Integer.toString(FACETS_DATE_MAXCOUNT)); // the year constraint should cause that limitation already
}
//for (String k: params.getParameterNames()) {ArrayList<String> al = new ArrayList<>(); for (String s: params.getParams(k)) al.add(s); System.out.println("Parameter: " + k + "=" + al.toString());}
//http://localhost:8090/solr/collection1/select?q=*:*&rows=0&facet=true&facet.field=dates_in_content_dts&f.dates_in_content_dts.facet.limit=730&f.dates_in_content_dts.facet.sort=index
} else { } else {
params.setFacet(false); params.setFacet(false);
} }
@ -454,6 +471,8 @@ public final class QueryParams {
return params; return params;
} }
long year = 1000L * 60L * 60L * 24L * 365L;
private String getFacets() { private String getFacets() {
// add site facets // add site facets
@ -500,8 +519,22 @@ public final class QueryParams {
fq.append(" AND ").append(QueryModifier.parseCollectionExpression(this.modifier.collection)); fq.append(" AND ").append(QueryModifier.parseCollectionExpression(this.modifier.collection));
} }
if (this.modifier.on != null && this.modifier.on.length() > 0 && this.solrSchema.contains(CollectionSchema.dates_in_content_sxt)) { if (this.solrSchema.contains(CollectionSchema.dates_in_content_dts)) {
fq.append(" AND ").append(QueryModifier.parseOnExpression(this.modifier.on)); if (this.modifier.on != null && this.modifier.on.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseOnExpression(this.modifier.on));
}
if (this.modifier.from != null && this.modifier.from.length() > 0 && (this.modifier.to == null || this.modifier.to.equals("*"))) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.modifier.from, null));
}
if ((this.modifier.from == null || this.modifier.from.equals("*")) && this.modifier.to != null && this.modifier.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(null, this.modifier.to));
}
if (this.modifier.from != null && this.modifier.from.length() > 0 && this.modifier.to != null && this.modifier.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.modifier.from, this.modifier.to));
}
} }
if (this.modifier.protocol != null) { if (this.modifier.protocol != null) {

@ -58,6 +58,7 @@ import net.yacy.cora.lod.vocabulary.Tagging;
import net.yacy.cora.order.Base64Order; import net.yacy.cora.order.Base64Order;
import net.yacy.cora.protocol.Domains; import net.yacy.cora.protocol.Domains;
import net.yacy.cora.protocol.Scanner; import net.yacy.cora.protocol.Scanner;
import net.yacy.cora.sorting.ClusteredScoreMap;
import net.yacy.cora.sorting.ConcurrentScoreMap; import net.yacy.cora.sorting.ConcurrentScoreMap;
import net.yacy.cora.sorting.ReversibleScoreMap; import net.yacy.cora.sorting.ReversibleScoreMap;
import net.yacy.cora.sorting.ScoreMap; import net.yacy.cora.sorting.ScoreMap;
@ -148,6 +149,7 @@ public final class SearchEvent {
public final ScoreMap<String> namespaceNavigator; // a counter for name spaces public final ScoreMap<String> namespaceNavigator; // a counter for name spaces
public final ScoreMap<String> protocolNavigator; // a counter for protocol types public final ScoreMap<String> protocolNavigator; // a counter for protocol types
public final ScoreMap<String> filetypeNavigator; // a counter for file types public final ScoreMap<String> filetypeNavigator; // a counter for file types
public final ScoreMap<String> dateNavigator; // a counter for file types
public final ScoreMap<String> languageNavigator; // a counter for appearance of languages public final ScoreMap<String> languageNavigator; // a counter for appearance of languages
public final Map<String, ScoreMap<String>> vocabularyNavigator; // counters for Vocabularies; key is metatag.getVocabularyName() public final Map<String, ScoreMap<String>> vocabularyNavigator; // counters for Vocabularies; key is metatag.getVocabularyName()
private final int topicNavigatorCount; // if 0 no topicNavigator, holds expected number of terms for the topicNavigator private final int topicNavigatorCount; // if 0 no topicNavigator, holds expected number of terms for the topicNavigator
@ -243,6 +245,7 @@ public final class SearchEvent {
this.hostNavigator = navcfg.contains("hosts") ? new ConcurrentScoreMap<String>() : null; this.hostNavigator = navcfg.contains("hosts") ? new ConcurrentScoreMap<String>() : null;
this.protocolNavigator = navcfg.contains("protocol") ? new ConcurrentScoreMap<String>() : null; this.protocolNavigator = navcfg.contains("protocol") ? new ConcurrentScoreMap<String>() : null;
this.filetypeNavigator = navcfg.contains("filetype") ? new ConcurrentScoreMap<String>() : null; this.filetypeNavigator = navcfg.contains("filetype") ? new ConcurrentScoreMap<String>() : null;
this.dateNavigator = navcfg.contains("date") ? new ClusteredScoreMap<String>(true) : null;
this.topicNavigatorCount = navcfg.contains("topics") ? MAX_TOPWORDS : 0; this.topicNavigatorCount = navcfg.contains("topics") ? MAX_TOPWORDS : 0;
this.languageNavigator = navcfg.contains("language") ? new ConcurrentScoreMap<String>() : null; this.languageNavigator = navcfg.contains("language") ? new ConcurrentScoreMap<String>() : null;
this.vocabularyNavigator = new TreeMap<String, ScoreMap<String>>(); this.vocabularyNavigator = new TreeMap<String, ScoreMap<String>>();
@ -836,6 +839,11 @@ public final class SearchEvent {
} }
} }
if (this.dateNavigator != null) {
fcts = facets.get(CollectionSchema.dates_in_content_dts.getSolrFieldName());
if (fcts != null) this.dateNavigator.inc(fcts);
}
if (this.languageNavigator != null) { if (this.languageNavigator != null) {
fcts = facets.get(CollectionSchema.language_s.getSolrFieldName()); fcts = facets.get(CollectionSchema.language_s.getSolrFieldName());
if (fcts != null) { if (fcts != null) {

@ -42,7 +42,6 @@ import java.util.List;
import java.util.Map; import java.util.Map;
import java.util.Set; import java.util.Set;
import java.util.TreeMap; import java.util.TreeMap;
import java.util.TreeSet;
import java.util.concurrent.BlockingQueue; import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicInteger;
@ -496,30 +495,13 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
if (firstSeen > 0 && firstSeen < lastModified.getTime()) lastModified = new Date(firstSeen); // patch the date if we have seen the document earlier if (firstSeen > 0 && firstSeen < lastModified.getTime()) lastModified = new Date(firstSeen); // patch the date if we have seen the document earlier
add(doc, CollectionSchema.last_modified, lastModified); add(doc, CollectionSchema.last_modified, lastModified);
} }
if (allAttr || if (allAttr || contains(CollectionSchema.dates_in_content_dts) || contains(CollectionSchema.dates_in_content_count_i)) {
contains(CollectionSchema.dates_in_content_sxt) || contains(CollectionSchema.dates_in_content_count_i) ||
contains(CollectionSchema.date_in_content_min_dt) || contains(CollectionSchema.date_in_content_max_dt)) {
LinkedHashSet<Date> dates_in_content = condenser.dates_in_content; LinkedHashSet<Date> dates_in_content = condenser.dates_in_content;
if (allAttr || contains(CollectionSchema.dates_in_content_count_i)) { if (allAttr || contains(CollectionSchema.dates_in_content_count_i)) {
add(doc, CollectionSchema.dates_in_content_count_i, dates_in_content.size()); add(doc, CollectionSchema.dates_in_content_count_i, dates_in_content.size());
} }
if (dates_in_content.size() > 0) { if (dates_in_content.size() > 0 && (allAttr || contains(CollectionSchema.dates_in_content_dts))) {
if (allAttr || contains(CollectionSchema.dates_in_content_sxt)) { add(doc, CollectionSchema.dates_in_content_dts, dates_in_content.toArray(new Date[dates_in_content.size()]));
String[] dates = new String[dates_in_content.size()];
int i = 0; for (Date d: dates_in_content) dates[i++] = org.apache.solr.schema.TrieDateField.formatExternal(d);
add(doc, CollectionSchema.dates_in_content_sxt, dates);
}
// order the dates to get the oldest and youngest
TreeSet<Date> ordered_dates = new TreeSet<>();
ordered_dates.addAll(dates_in_content);
if (allAttr || contains(CollectionSchema.date_in_content_min_dt)) {
Date date_in_content_min_dt = ordered_dates.iterator().next();
add(doc, CollectionSchema.date_in_content_min_dt, date_in_content_min_dt);
}
if (allAttr || contains(CollectionSchema.date_in_content_max_dt)) {
Date date_in_content_max_dt = ordered_dates.descendingIterator().next();
add(doc, CollectionSchema.date_in_content_max_dt, date_in_content_max_dt);
}
} }
} }
if (allAttr || contains(CollectionSchema.keywords)) { if (allAttr || contains(CollectionSchema.keywords)) {
@ -1085,7 +1067,7 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
collection1hosts = hostfacet.get(CollectionSchema.host_s.getSolrFieldName()); collection1hosts = hostfacet.get(CollectionSchema.host_s.getSolrFieldName());
} catch (final IOException e2) { } catch (final IOException e2) {
ConcurrentLog.logException(e2); ConcurrentLog.logException(e2);
collection1hosts = new ClusteredScoreMap<String>(); collection1hosts = new ClusteredScoreMap<String>(true);
} }
postprocessingActivity = "create ranking map"; postprocessingActivity = "create ranking map";
@ -1173,7 +1155,7 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
if (collection1hosts.size() != countcheck) ConcurrentLog.warn("CollectionConfiguration", "ambiguous host count: expected=" + collection1hosts.size() + ", counted=" + countcheck); if (collection1hosts.size() != countcheck) ConcurrentLog.warn("CollectionConfiguration", "ambiguous host count: expected=" + collection1hosts.size() + ", counted=" + countcheck);
} catch (final IOException e2) { } catch (final IOException e2) {
ConcurrentLog.logException(e2); ConcurrentLog.logException(e2);
collection1hosts = new ClusteredScoreMap<String>(); collection1hosts = new ClusteredScoreMap<String>(true);
} }
// process all documents at the webgraph for the outgoing links of this document // process all documents at the webgraph for the outgoing links of this document
@ -1192,7 +1174,7 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
webgraphhosts = hostfacet.get(WebgraphSchema.source_host_s.getSolrFieldName()); webgraphhosts = hostfacet.get(WebgraphSchema.source_host_s.getSolrFieldName());
} catch (final IOException e2) { } catch (final IOException e2) {
ConcurrentLog.logException(e2); ConcurrentLog.logException(e2);
webgraphhosts = new ClusteredScoreMap<String>(); webgraphhosts = new ClusteredScoreMap<String>(true);
} }
try { try {
final long start = System.currentTimeMillis(); final long start = System.currentTimeMillis();

@ -35,10 +35,8 @@ public enum CollectionSchema implements SchemaDeclaration {
sku(SolrType.string, true, true, false, true, true, "url of document"), // a 'sku' is a stock-keeping unit, a unique identifier and a default field in unmodified solr. sku(SolrType.string, true, true, false, true, true, "url of document"), // a 'sku' is a stock-keeping unit, a unique identifier and a default field in unmodified solr.
//sku(SolrType.text_en_splitting_tight, true, true, false, true, true, "url of document"), // a 'sku' is a stock-keeping unit, a unique identifier and a default field in unmodified solr. //sku(SolrType.text_en_splitting_tight, true, true, false, true, true, "url of document"), // a 'sku' is a stock-keeping unit, a unique identifier and a default field in unmodified solr.
last_modified(SolrType.date, true, true, false, false, false, "last-modified from http header"), last_modified(SolrType.date, true, true, false, false, false, "last-modified from http header"),
dates_in_content_sxt(SolrType.string, true, true, true, false, true, "if date expressions can be found in the content, these dates are listed here in order of the appearances"), dates_in_content_dts(SolrType.date, true, true, true, false, true, "if date expressions can be found in the content, these dates are listed here as date objects in order of the appearances"),
dates_in_content_count_i(SolrType.num_integer, true, true, false, false, false, "the number of entries in dates_in_content_sxt"), dates_in_content_count_i(SolrType.num_integer, true, true, false, false, false, "the number of entries in dates_in_content_sxt"),
date_in_content_min_dt(SolrType.date, true, true, false, false, false, "if dates_in_content_sxt is filled, this contains the oldest date from the list of available dates"),
date_in_content_max_dt(SolrType.date, true, true, false, false, false, "if dates_in_content_sxt is filled, this contains the youngest date from the list of available dates, that may also be possibly in the future"),
content_type(SolrType.string, true, true, true, false, false, "mime-type of document"), content_type(SolrType.string, true, true, true, false, false, "mime-type of document"),
http_unique_b(SolrType.bool, true, true, false, false, false, "unique-field which is true when an url appears the first time. If the same url which was http then appears as https (or vice versa) then the field is false"), http_unique_b(SolrType.bool, true, true, false, false, false, "unique-field which is true when an url appears the first time. If the same url which was http then appears as https (or vice versa) then the field is false"),
www_unique_b(SolrType.bool, true, true, false, false, false, "unique-field which is true when an url appears the first time. If the same url within the subdomain www then appears without that subdomain (or vice versa) then the field is false"), www_unique_b(SolrType.bool, true, true, false, false, false, "unique-field which is true when an url appears the first time. If the same url within the subdomain www then appears without that subdomain (or vice versa) then the field is false"),
@ -362,6 +360,12 @@ public enum CollectionSchema implements SchemaDeclaration {
doc.setField(this.getSolrFieldName(), value); doc.setField(this.getSolrFieldName(), value);
} }
@Override
public final void add(final SolrInputDocument doc, final Date[] value) {
assert this.isMultiValued();
doc.setField(this.getSolrFieldName(), value);
}
@Override @Override
public final void add(final SolrInputDocument doc, final String[] value) { public final void add(final SolrInputDocument doc, final String[] value) {
assert this.isMultiValued(); assert this.isMultiValued();

@ -222,6 +222,12 @@ public enum WebgraphSchema implements SchemaDeclaration {
doc.setField(this.getSolrFieldName(), value); doc.setField(this.getSolrFieldName(), value);
} }
@Override
public final void add(final SolrInputDocument doc, final Date[] value) {
assert this.isMultiValued();
doc.setField(this.getSolrFieldName(), value);
}
@Override @Override
public final void add(final SolrInputDocument doc, final String[] value) { public final void add(final SolrInputDocument doc, final String[] value) {
assert this.isMultiValued(); assert this.isMultiValued();

Loading…
Cancel
Save