added a new way of content browsing in search results:

- date navigation

The date is taken from the CONTENT of the documents / web pages, NOT
from a date submitted in the context of metadata (i.e. http header or
html head form). This makes it possible to search for documents in the
future, i.e. when documents contain event descriptions for future
events.

The date is written to an index field which is now enabled by default.
All documents are scanned for contained date mentions.
To visualize the dates for a specific search results, a histogram
showing the number of documents for each day is displayed. To render
these histograms the morris.js library is used. Morris.js requires also
raphael.js which is now also integrated in YaCy.

The histogram is now also displayed in the index browser by default.

To select a specific range from a search result, the following modifiers
had been introduced:
from:<date>
to:<date>
These modifiers can be used separately (i.e. only 'from' or only 'to')
to describe an open interval or combined to have a closed interval. Both
dates are inclusive. To select a specific single date only, use the
'to:' - modifier.

The histogram shows blue and green lines; the green lines denot weekend
days (saturday and sunday).

Clicking on bars in the histogram has the following reaction:
1st click: add a from:<date> modifier for the date of the bar
2nd click: add a to:<date> modifier for the date of the bar
3rd click: remove from and date modifier and set a on:<date> for the bar
When the on:<date> modifier is used, the histogram shows an unlimited
time period. This makes it possible to click again (4th click) which is
then interpreted as a 1st click again (sets a from modifier).

The display feature is NOT switched on by default; to switch it on use
the /ConfigSearchPage_p.html servlet.
pull/1/head
Michael Peter Christen 10 years ago
parent c3aadcf899
commit 535f1ebe3b

@ -18,17 +18,11 @@ sku
## last-modified from http header, date (mandatory field)
last_modified
## if date expressions can be found in the content, these dates are listed here in order of the appearances
#dates_in_content_sxt
## if date expressions can be found in the content, these dates are listed here as date objects in order of the appearances
dates_in_content_dts
## the number of entries in dates_in_content_sxt
#dates_in_content_count_i
## if dates_in_content_sxt is filled, this contains the oldest date from the list of available dates
#date_in_content_min_dt
## if dates_in_content_sxt is filled, this contains the youngest date from the list of available dates, that may also be possibly in the future
#date_in_content_max_dt
dates_in_content_count_i
## mime-type of document, string (mandatory field)
content_type

@ -837,7 +837,7 @@ search.result.show.vocabulary.omit =
# can be temporary different if search string is given with differen navigation values
# assigning no value(s) means that no navigation is shown
search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language
#search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language,collections
#search.navigation=location,hosts,authors,namespace,topics,filetype,protocol,language,collections,date
# search result verification and snippet fetch caching rules
# each search result can be verified byloading the link from the web

@ -175,8 +175,45 @@
</fieldset>
</div>
<!-- the search result -->
<div style="float: left;">
<input type="checkbox" name="search.navigation.date" value="true" #(search.navigation.date)#::checked="checked" #(/search.navigation.date)# /> Date Navigation
<link rel="stylesheet" href="/env/morris.css">
<script src="/js/raphael-min.js"></script>
<script src="/js/morris.js"></script>
<div id="graph" style="height:200px"></div>
<script>
var solr= $.getJSON("http://localhost:8090/solr/collection1/select?q=*:*&defType=edismax&start=0&rows=0&wt=json&facet=true&facet.field=dates_in_content_dts&facet.sort=index", function(data) {
dates_in_content_dts = data.facet_counts.facet_fields.dates_in_content_dts;
var parsed = [];
for (var i = 0; i < dates_in_content_dts.length; i = i + 2) {
var date = dates_in_content_dts[i];
var count = dates_in_content_dts[i + 1];
if (date && count) {parsed[parsed.length] = {x: date,y: count};};
};
if (parsed.length > 0) Morris.Bar({
element: 'graph',
data: parsed,
xkey: 'x',
ykeys: ['y'],
labels: ['number of documents about this date'],
yLabelFormat: function (y) { return y.toString() + ' docs'; },
barColors: function (row, series, type) {
var d = new Date(row.label);
if (d.getDay() === 6) return '#4aaf46'; //saturday
if (d.getDay() === 0) return '#4aaf46'; //sunday
return '#3574c0';
},
hideHover: 'false'
}).on('click', function(i, row) {
console.log(i, row);
});
});
</script>
<fieldset>
<div class="searchresults">
<h4 class="linktitle">

@ -88,6 +88,7 @@ public class ConfigSearchPage_p {
if (post.getBoolean("search.navigation.collections")) nav += "collections,";
if (post.getBoolean("search.navigation.namespace")) nav += "namespace,";
if (post.getBoolean("search.navigation.topics")) nav += "topics,";
if (post.getBoolean("search.navigation.date")) nav += "date,";
if (nav.endsWith(",")) nav = nav.substring(0, nav.length() - 1);
sb.setConfig("search.navigation", nav);
}
@ -166,6 +167,7 @@ public class ConfigSearchPage_p {
prop.put("search.navigation.collections", sb.getConfig("search.navigation", "").indexOf("collections",0) >= 0 ? 1 : 0);
prop.put("search.navigation.namespace", sb.getConfig("search.navigation", "").indexOf("namespace",0) >= 0 ? 1 : 0);
prop.put("search.navigation.topics", sb.getConfig("search.navigation", "").indexOf("topics",0) >= 0 ? 1 : 0);
prop.put("search.navigation.date", sb.getConfig("search.navigation", "").indexOf("date",0) >= 0 ? 1 : 0);
prop.put("about.headline", sb.getConfig("about.headline", "About"));
prop.put("about.body", sb.getConfig("about.body", ""));

@ -32,8 +32,6 @@ import net.yacy.cora.lod.vocabulary.Tagging;
import net.yacy.cora.protocol.ClientIdentification;
import net.yacy.cora.protocol.RequestHeader;
import net.yacy.cora.util.Html2Image;
import net.yacy.cora.util.JSONException;
import net.yacy.cora.util.JSONObject;
import net.yacy.crawler.data.CrawlProfile;
import net.yacy.document.LibraryProvider;
import net.yacy.search.Switchboard;

@ -113,6 +113,39 @@ function updatepage(str) {
<div class="error" style="float:left;">&nbsp;&nbsp;&nbsp;Load Errors</div>
</div>
</fieldset>
<link rel="stylesheet" href="/env/morris.css">
<script src="/js/raphael-min.js"></script>
<script src="/js/morris.js"></script>
<div id="graph" style="height:200px"></div>
<script>
var solr= $.getJSON("http://localhost:8090/solr/collection1/select?q=*:*&defType=edismax&start=0&rows=0&wt=json&facet=true&facet.field=dates_in_content_dts&facet.sort=index", function(data) {
dates_in_content_dts = data.facet_counts.facet_fields.dates_in_content_dts;
var parsed = [];
for (var i = 0; i < dates_in_content_dts.length; i = i + 2) {
var date = dates_in_content_dts[i];
var count = dates_in_content_dts[i + 1];
if (date && count) {parsed[parsed.length] = {x: date,y: count};};
};
if (parsed.length > 0) Morris.Bar({
element: 'graph',
data: parsed,
xkey: 'x',
ykeys: ['y'],
labels: ['number of documents about this date'],
yLabelFormat: function (y) { return y.toString() + ' docs'; },
barColors: function (row, series, type) {
var d = new Date(row.label);
if (d.getDay() === 6) return '#4aaf46'; //saturday
if (d.getDay() === 0) return '#4aaf46'; //sunday
return '#3574c0';
},
hideHover: 'false'
}).on('click', function(i, row) {
console.log(i, row);
});
});
</script>
#(/hosts)#
#(hostanalysis)#::

@ -204,16 +204,16 @@ public class HostBrowser {
// collect hosts from index
ReversibleScoreMap<String> hostscore = fulltext.getDefaultConnector().getFacets(AbstractSolrConnector.CATCHALL_QUERY, maxcount, CollectionSchema.host_s.getSolrFieldName()).get(CollectionSchema.host_s.getSolrFieldName());
if (hostscore == null) hostscore = new ClusteredScoreMap<String>();
if (hostscore == null) hostscore = new ClusteredScoreMap<String>(true);
// collect hosts from crawler
final Map<String, Integer[]> crawler = (authorized) ? sb.crawlQueues.noticeURL.getDomainStackHosts(StackType.LOCAL, sb.robots) : new HashMap<String, Integer[]>();
// collect the errorurls
Map<String, ReversibleScoreMap<String>> exclfacets = authorized ? fulltext.getDefaultConnector().getFacets(CollectionSchema.failtype_s.getSolrFieldName() + ":" + FailType.excl.name(), maxcount, CollectionSchema.host_s.getSolrFieldName()) : null;
ReversibleScoreMap<String> exclscore = exclfacets == null ? new ClusteredScoreMap<String>() : exclfacets.get(CollectionSchema.host_s.getSolrFieldName());
ReversibleScoreMap<String> exclscore = exclfacets == null ? new ClusteredScoreMap<String>(true) : exclfacets.get(CollectionSchema.host_s.getSolrFieldName());
Map<String, ReversibleScoreMap<String>> failfacets = authorized ? fulltext.getDefaultConnector().getFacets(CollectionSchema.failtype_s.getSolrFieldName() + ":" + FailType.fail.name(), maxcount, CollectionSchema.host_s.getSolrFieldName()) : null;
ReversibleScoreMap<String> failscore = failfacets == null ? new ClusteredScoreMap<String>() : failfacets.get(CollectionSchema.host_s.getSolrFieldName());
ReversibleScoreMap<String> failscore = failfacets == null ? new ClusteredScoreMap<String>(true) : failfacets.get(CollectionSchema.host_s.getSolrFieldName());
int c = 0;
Iterator<String> i = hostscore.keys(false);

@ -158,7 +158,7 @@ public class WebStructurePicture_p {
final double radius = 1.0 / (1 << nextlayer);
final WebStructureGraph.StructureEntry sr = structure.outgoingReferences(pivotnode.getKey());
final Map<String, Integer> next = (sr == null) ? new HashMap<String, Integer>() : sr.references;
ClusteredScoreMap<String> next0 = new ClusteredScoreMap<String>();
ClusteredScoreMap<String> next0 = new ClusteredScoreMap<String>(false);
for (Map.Entry<String, Integer> entry: next.entrySet()) next0.set(entry.getKey(), entry.getValue());
// first set points to next hosts
final List<Map.Entry<String, String>> targets = new ArrayList<Map.Entry<String, String>>();

@ -0,0 +1,2 @@
.morris-hover{position:absolute;z-index:1000}.morris-hover.morris-default-style{border-radius:10px;padding:6px;color:#666;background:rgba(255,255,255,0.8);border:solid 2px rgba(230,230,230,0.8);font-family:sans-serif;font-size:12px;text-align:center}.morris-hover.morris-default-style .morris-hover-row-label{font-weight:bold;margin:0.25em 0}
.morris-hover.morris-default-style .morris-hover-point{white-space:nowrap;margin:0.1em 0}

@ -58,7 +58,7 @@
<div class="input-group">
<input name="query" id="search" type="text" size="40" maxlength="80" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" class="form-control searchinput typeahead" />
<div class="input-group-btn">
<button type="submit" id="Enter" class="btn btn-primary">Search</button>
<button id="Enter" name="Enter" class="btn btn-primary" type="submit">Search</button>
</div>
</div>
<input type="hidden" name="verify" value="#[search.verify]#" />

@ -28,7 +28,6 @@
// javac -classpath .:../classes index.java
// if the shell's current path is HTROOT
import java.io.IOException;
import net.yacy.cora.document.analysis.Classification;
import net.yacy.cora.document.analysis.Classification.ContentDomain;
import net.yacy.cora.protocol.RequestHeader;

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -92,9 +92,9 @@ Use the RSS search result format to add static searches to your RSS reader, if y
<form class="search small" name="searchform" action="" method="get" accept-charset="UTF-8" style="position:fixed;top:8px;z-index:1052;max-width:500px;">
<div class="input-group">
<input type="text" class="form-control searchinput typeahead" size="40" maxlength="200" placeholder="#[promoteSearchPageGreeting]#" name="query" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" id="search" onclick="document.getElementById('Enter').innerHTML = 'search'"/>
<input name="query" id="search" type="text" class="form-control searchinput typeahead" size="40" maxlength="200" placeholder="#[promoteSearchPageGreeting]#" value="#[former]#" #(focus)#::autofocus="autofocus"#(/focus)# onFocus="this.select()" onclick="document.getElementById('Enter').innerHTML = 'search'"/>
<div class="input-group-btn">
<button id="Enter" class="btn btn-default" type="submit">search</button>
<button id="Enter" name="Enter" class="btn btn-default" type="submit">search</button>
</div>
</div>
<input type="hidden" name="contentdom" id="contentdom" value="#[contentdom]#" />
@ -175,6 +175,9 @@ document.getElementById("Enter").innerHTML = "search again";
</div>
#(/geoinfo)#
<!-- show date histogram if date navigation is active -->
<div id="datehistogram"></div>
<!-- linklist begin -->
#(resultTable)#::<table width="100%"><tr class="TableHeader"><td width="30%">Media</td><td width="70%">URL</td></tr>#(/resultTable)#
#{results}#

@ -91,6 +91,58 @@ show search results for "#[query]#" on map</a>
</ul>
#(/cat-location)#
#(nav-dates)#::
<link rel="stylesheet" href="/env/morris.css">
<script src="/js/raphael-min.js"></script>
<script src="/js/morris.js"></script>
<script>
document.getElementById("datehistogram").style = "height:200px";
dates_in_content_dts = [#{element}#"#[name]#","#[count]#"#(nl)#::,#(/nl)##{/element}#];
var parsed = [];
for (var i = 0; i < dates_in_content_dts.length; i = i + 2) {
var date = dates_in_content_dts[i];
var count = dates_in_content_dts[i + 1];
if (date && count) {parsed[parsed.length] = {x: date,y: count};};
};
if (parsed.length > 0) Morris.Bar({
element: 'datehistogram',
data: parsed,
xkey: 'x',
ykeys: ['y'],
labels: ['number of documents about this date'],
yLabelFormat: function (y) { return y.toString() + ' docs'; },
barColors: function (row, series, type) {
var d = new Date(row.label);
if (d.getDay() === 6) return '#4aaf46'; //saturday
if (d.getDay() === 0) return '#4aaf46'; //sunday
return '#3574c0';
},
hideHover: 'false'
}).on('click', function(i, row) {
var query = document.getElementsByClassName('searchinput')[0].getAttribute("value");
var onp = -1, fromp = -1, top = -1;
if ((onp = query.indexOf("on:")) >= 0) {
query = query.substring(0, onp - 1);
}
if ((fromp = query.indexOf("from:")) < 0) {
query = query + " from:" + row.x;
document.getElementsByClassName('searchinput')[0].value = query;
document.getElementById('Enter').click();
} else if ((top = query.indexOf("to:")) < 0) {
query = query + " to:" + row.x;
document.getElementsByClassName('searchinput')[0].value = query;
document.getElementById('Enter').click();
} else {
query = query.substring(0, fromp) + " on:" + row.x;
document.getElementsByClassName('searchinput')[0].value = query;
document.getElementById('Enter').click();
}
var date = row.x;
console.log(i, row, query);
});
</script>
#(/nav-dates)#
#(nav-domains)#::
<ul class="nav nav-sidebar menugroup">
<li><h3>Provider</h3></li>

@ -1,13 +1,8 @@
// yacysearchitem.java
// (C) 2007 by Michael Peter Christen; mc@yacy.net, Frankfurt a. M., Germany
// first published 28.08.2007 on http://yacy.net
//
// This is a part of YaCy, a peer-to-peer based web search engine
//
// $LastChangedDate$
// $LastChangedRevision$
// $LastChangedBy$
//
// LICENSE
//
// This program is free software; you can redistribute it and/or modify
@ -24,17 +19,22 @@
// along with this program; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
import java.text.ParseException;
import java.util.AbstractMap;
import java.util.Date;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Map;
import org.apache.solr.schema.TrieDateField;
import net.yacy.cora.document.analysis.Classification;
import net.yacy.cora.document.analysis.Classification.ContentDomain;
import net.yacy.cora.document.id.MultiProtocolURL;
import net.yacy.cora.lod.vocabulary.Tagging;
import net.yacy.cora.protocol.RequestHeader;
import net.yacy.cora.sorting.ScoreMap;
import net.yacy.document.DateDetection;
import net.yacy.document.LibraryProvider;
import net.yacy.kelondro.util.ISO639;
import net.yacy.peers.graphics.ProfilingGraph;
@ -55,6 +55,7 @@ public class yacysearchtrailer {
private static final int TOPWORDS_MINSIZE = 8;
private static final int TOPWORDS_MAXSIZE = 22;
@SuppressWarnings({ "deprecation", "static-access" })
public static serverObjects respond(final RequestHeader header, final serverObjects post, final serverSwitch env) {
final serverObjects prop = new serverObjects();
final Switchboard sb = (Switchboard) env;
@ -105,12 +106,10 @@ public class yacysearchtrailer {
navigatorIterator = theSearch.namespaceNavigator.keys(false);
int i = 0, pos = 0, neg = 0;
String nav;
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next();
count = theSearch.namespaceNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
nav = "inurl%3A" + name;
if (!theSearch.query.modifier.toString().contains("inurl:"+name)) {
pos++;
@ -131,10 +130,7 @@ public class yacysearchtrailer {
prop.put("nav-namespace_element", i);
i--;
prop.put("nav-namespace_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
prop.put("nav-namespace", 0); // this navigation is not useful
}
if (pos == 1 && neg == 0) prop.put("nav-namespace", 0); // this navigation is not useful
}
// host navigators
@ -146,12 +142,10 @@ public class yacysearchtrailer {
navigatorIterator = hostNavigator.keys(false);
int i = 0, pos = 0, neg = 0;
String nav;
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next();
count = hostNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
nav = "site%3A" + name;
if (theSearch.query.modifier.sitehost == null || !theSearch.query.modifier.sitehost.contains(name)) {
pos++;
@ -172,10 +166,7 @@ public class yacysearchtrailer {
prop.put("nav-domains_element", i);
i--;
prop.put("nav-domains_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
prop.put("nav-domains", 0); // this navigation is not useful
}
if (pos == 1 && neg == 0) prop.put("nav-domains", 0); // this navigation is not useful
}
// language navigators
@ -187,12 +178,10 @@ public class yacysearchtrailer {
navigatorIterator = languageNavigator.keys(false);
int i = 0, pos = 0, neg = 0;
String nav;
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next();
count = languageNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
nav = "%2Flanguage%2F" + name;
if (theSearch.query.modifier.language == null || !theSearch.query.modifier.language.contains(name)) {
pos++;
@ -214,10 +203,7 @@ public class yacysearchtrailer {
prop.put("nav-languages_element", i);
i--;
prop.put("nav-languages_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
prop.put("nav-languages", 0); // this navigation is not useful
}
if (pos == 1 && neg == 0) prop.put("nav-languages", 0); // this navigation is not useful
}
// author navigators
@ -228,12 +214,10 @@ public class yacysearchtrailer {
navigatorIterator = theSearch.authorNavigator.keys(false);
int i = 0, pos = 0, neg = 0;
String nav;
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim();
count = theSearch.authorNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
nav = (name.indexOf(' ', 0) < 0) ? "author%3A" + name : "author%3A%28" + name.replace(" ", "+") + "%29";
if (theSearch.query.modifier.author == null || !theSearch.query.modifier.author.contains(name)) {
pos++;
@ -254,8 +238,7 @@ public class yacysearchtrailer {
prop.put("nav-authors_element", i);
i--;
prop.put("nav-authors_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
if (pos == 1 && neg == 0) {
prop.put("nav-authors", 0); // this navigation is not useful
}
}
@ -268,12 +251,10 @@ public class yacysearchtrailer {
navigatorIterator = theSearch.collectionNavigator.keys(false);
int i = 0, pos = 0, neg = 0;
String nav;
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim();
count = theSearch.collectionNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
nav = (name.indexOf(' ', 0) < 0) ? "collection%3A" + name : "collection%3A%28" + name.replace(" ", "+") + "%29";
if (theSearch.query.modifier.collection == null || !theSearch.query.modifier.collection.contains(name)) {
pos++;
@ -294,10 +275,7 @@ public class yacysearchtrailer {
prop.put("nav-collections_element", i);
i--;
prop.put("nav-collections_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
prop.put("nav-collections", 0); // this navigation is not useful
}
if (pos == 1 && neg == 0) prop.put("nav-collections", 0); // this navigation is not useful
}
// topics navigator
@ -360,12 +338,10 @@ public class yacysearchtrailer {
if (oldProtocolModifier != null && oldProtocolModifier.length() > 0) {theSearch.query.modifier.remove("/" + oldProtocolModifier); theSearch.query.modifier.remove(oldProtocolModifier);}
theSearch.query.modifier.protocol = "";
theSearch.query.getQueryGoal().query_original = oldQuery.replaceAll(" /https", "").replaceAll(" /http", "").replaceAll(" /ftp", "").replaceAll(" /smb", "").replaceAll(" /file", "");
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim();
count = theSearch.protocolNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
visible = visible || "ftp,smb".indexOf(name) >= 0;
nav = "%2F" + name;
if (oldProtocolModifier == null || !oldProtocolModifier.equals(name)) {
@ -391,10 +367,60 @@ public class yacysearchtrailer {
prop.put("nav-protocols_element", i);
i--;
prop.put("nav-protocols_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
prop.put("nav-protocols", 0); // this navigation is not useful
if (pos == 1 && neg == 0) prop.put("nav-protocols", 0); // this navigation is not useful
}
// date navigators
if (theSearch.dateNavigator == null || theSearch.dateNavigator.isEmpty()) {
prop.put("nav-dates", 0);
} else {
prop.put("nav-dates", 1);
navigatorIterator = theSearch.dateNavigator.iterator(); // this iterator is different as it iterates by the key order (which is a date order)
int i = 0, pos = 0, neg = 0;
long dx = -1;
long dayms = 1000L * 60L * 60L * 24L;
Date fromconstraint = theSearch.getQuery().modifier.from == null ? null : DateDetection.parseLine(theSearch.getQuery().modifier.from);
if (fromconstraint == null) fromconstraint = new Date(System.currentTimeMillis() - 365 * dayms);
Date toconstraint = theSearch.getQuery().modifier.to == null ? null : DateDetection.parseLine(theSearch.getQuery().modifier.to);
if (toconstraint == null) toconstraint = new Date(System.currentTimeMillis() + 365 * dayms);
while (i < QueryParams.FACETS_DATE_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim();
if (name.length() < 10) continue;
count = theSearch.dateNavigator.get(name);
String shortname = name.substring(0, 10);
long d;
Date dd;
try {dd = TrieDateField.parseDate(name); d = dd.getTime();} catch (ParseException e) {continue;}
if (fromconstraint != null && dd.before(fromconstraint)) continue;
if (toconstraint != null && dd.after(toconstraint)) break;
if (dx > 0) {
while (d - dx > dayms) {
dx += dayms;
String sn = TrieDateField.formatExternal(new Date(dx)).substring(0, 10);
prop.put("nav-dates_element_" + i + "_on", 0);
prop.put(fileType, "nav-dates_element_" + i + "_name", sn);
prop.put("nav-dates_element_" + i + "_count", 0);
prop.put("nav-dates_element_" + i + "_nl", 1);
i++;
}
}
dx = d;
if (theSearch.query.modifier.on == null || !theSearch.query.modifier.on.contains(shortname) ) {
pos++;
prop.put("nav-dates_element_" + i + "_on", 1);
} else {
neg++;
prop.put("nav-dates_element_" + i + "_on", 0);
}
prop.put(fileType, "nav-dates_element_" + i + "_name", shortname);
prop.put("nav-dates_element_" + i + "_count", count);
prop.put("nav-dates_element_" + i + "_nl", 1);
i++;
}
prop.put("nav-dates_element", i);
i--;
prop.put("nav-dates_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0) prop.put("nav-dates", 0); // this navigation is not useful
}
// filetype navigators
@ -406,12 +432,10 @@ public class yacysearchtrailer {
int i = 0, pos = 0, neg = 0;
String nav;
boolean visible = false;
while (i < 10 && navigatorIterator.hasNext()) {
while (i < QueryParams.FACETS_STANDARD_MAXCOUNT && navigatorIterator.hasNext()) {
name = navigatorIterator.next().trim();
count = theSearch.filetypeNavigator.get(name);
if (count == 0) {
break;
}
if (count == 0) break;
visible = visible || Classification.isMediaExtension(name) || "pdf,doc,docx".indexOf(name) >= 0;
nav = "filetype%3A" + name;
if (theSearch.query.modifier.filetype == null || !theSearch.query.modifier.filetype.contains(name) ) {
@ -433,10 +457,7 @@ public class yacysearchtrailer {
prop.put("nav-filetypes_element", i);
i--;
prop.put("nav-filetypes_element_" + i + "_nl", 0);
if (pos == 1 && neg == 0)
{
prop.put("nav-filetypes", 0); // this navigation is not useful
}
if (pos == 1 && neg == 0) prop.put("nav-filetypes", 0); // this navigation is not useful
}
// vocabulary navigators
@ -455,9 +476,7 @@ public class yacysearchtrailer {
while (i < 20 && navigatorIterator.hasNext()) {
name = navigatorIterator.next();
count = ve.getValue().get(name);
if (count == 0) {
break;
}
if (count == 0) break;
nav = "%2Fvocabulary%2F" + navname + "%2F" + MultiProtocolURL.escape(Tagging.encodePrintname(name)).toString();
if (!theSearch.query.modifier.toString().contains("/vocabulary/" + navname + "/" + name.replace(' ', '_'))) {
prop.put("nav-vocabulary_" + navvoccount + "_element_" + i + "_on", 1);
@ -511,10 +530,6 @@ public class yacysearchtrailer {
return prop;
}
private static final boolean on(final int pos, final int neg, final int maxlimit) {
return neg > 0 || (pos > 1 && pos <= maxlimit);
}
}
//http://localhost:8090/yacysearch.html?query=java+&amp;maximumRecords=10&amp;resource=local&amp;verify=cacheonly&amp;nav=hosts,authors,namespace,topics,filetype,protocol&amp;urlmaskfilter=ftp://.*&amp;prefermaskfilter=&amp;constraint=&amp;contentdom=text&amp;former=java+%2Fftp&amp;startRecord=0
//http://localhost:8090/yacysearch.html?query=java+&amp;maximumRecords=10&amp;resource=local&amp;verify=cacheonly&amp;nav=hosts,authors,namespace,topics,filetype,protocol&amp;urlmaskfilter=.*&amp;prefermaskfilter=&amp;constraint=&amp;contentdom=text&amp;former=java+%2Fvocabulary%2FGewerke%2FTore&amp;startRecord=0

@ -1,6 +1,5 @@
"navigation": [#(nav-filetypes)#::
{
"facetname": "filetypes",
"navigation": {#(nav-dates)#::
"dates": {
"displayname": "Filetype",
"type": "String",
"min": "0",
@ -9,11 +8,22 @@
"elements": [
#{element}#
{"name": "#[name]#", "count": "#[count]#", "modifier": "#[modifier]#", "url": "#[url]#"}#(nl)#::,#(/nl)#
#{/element}#
]
},#(/nav-dates)##(nav-filetypes)#::
"filetypes": {
"displayname": "Filetype",
"type": "String",
"min": "0",
"max": "0",
"mean": "0",
"elements": [
#{element}#
{"name": "#[name]#", "count": "#[count]#"}#(nl)#::,#(/nl)#
#{/element}#
]
},#(/nav-filetypes)##(nav-protocols)#::
{
"facetname": "protocols",
"protocols": {
"displayname": "Protocol",
"type": "String",
"min": "0",
@ -25,8 +35,7 @@
#{/element}#
]
},#(/nav-protocols)##(nav-domains)#::
{
"facetname": "domains",
"domains": {
"displayname": "Domains",
"type": "String",
"min": "0",
@ -38,8 +47,7 @@
#{/element}#
]
},#(/nav-domains)##(nav-namespace)#::
{
"facetname": "namespace",
"namespace": {
"displayname": "Name Space",
"type": "String",
"min": "0",
@ -51,8 +59,7 @@
#{/element}#
]
},#(/nav-namespace)##(nav-authors)#::
{
"facetname": "authors",
"authors": {
"displayname": "Authors",
"type": "String",
"min": "0",
@ -64,8 +71,7 @@
#{/element}#
]
},#(/nav-authors)##(nav-collections)#::
{
"facetname": "collections",
"collections": {
"displayname": "Collections",
"type": "String",
"min": "0",
@ -77,8 +83,7 @@
#{/element}#
]
},#(/nav-collections)##{nav-vocabulary}#
{
"facetname": "#[navname]#",
"#[navname]#": {
"displayname": "#[navname]#",
"type": "String",
"min": "0",
@ -90,8 +95,7 @@
#{/element}#
]
},#{/nav-vocabulary}##(nav-topics)#{}::
{
"facetname": "topics",
"topics": {
"displayname": "Topics",
"type": "String",
"min": "0",
@ -103,5 +107,5 @@
#{/element}#
]
}#(/nav-topics)#
],
},
"totalResults": "#[num-results_totalcount]#"

@ -23,9 +23,7 @@ import java.io.IOException;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.List;
import net.yacy.cora.document.encoding.ASCII;
import net.yacy.cora.document.encoding.UTF8;
import net.yacy.cora.document.feed.RSSFeed;
import net.yacy.cora.document.feed.RSSMessage;
@ -41,7 +39,6 @@ import net.yacy.document.TextParser;
import net.yacy.kelondro.data.meta.URIMetadataNode;
import net.yacy.search.query.QueryParams;
import net.yacy.search.schema.CollectionSchema;
import org.apache.http.entity.mime.content.ContentBody;
/**
* Handling of queries to remote OpenSearch systems. Iterates to a list of

@ -111,6 +111,11 @@ public class SchemaConfiguration extends Configuration implements Serializable {
if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.getTime() > 0))) key.add(doc, value);
}
public void add(final SolrInputDocument doc, final SchemaDeclaration key, final Date[] value) {
assert key.isMultiValued() : "key = " + key.getSolrFieldName();
if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.length > 0))) key.add(doc, value);
}
public void add(final SolrInputDocument doc, final SchemaDeclaration key, final String[] value) {
assert key.isMultiValued() : "key = " + key.getSolrFieldName();
if ((isEmpty() || contains(key)) && (!this.lazy || (value != null && value.length > 0))) key.add(doc, value);

@ -59,6 +59,8 @@ public interface SchemaDeclaration {
public void add(final SolrInputDocument doc, final long value);
public void add(final SolrInputDocument doc, final Date[] value);
public void add(final SolrInputDocument doc, final String[] value);
public void add(final SolrInputDocument doc, final Integer[] value);

@ -48,6 +48,7 @@ import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.TextField;
import org.apache.solr.schema.TrieDateField;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
@ -257,6 +258,7 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
writer.write(lb);
}
@SuppressWarnings({ "static-access", "deprecation" })
private static void writeField(final Writer writer, final String typeName, final String name, final String value) throws IOException {
if (typeName.equals(SolrType.text_general.printName()) ||
typeName.equals(SolrType.string.printName()) ||
@ -269,7 +271,7 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
} else if (typeName.equals(SolrType.num_long.printName())) {
writeTag(writer, "long", name, value, true);
} else if (typeName.equals(SolrType.date.printName())) {
writeTag(writer, "date", name, org.apache.solr.schema.TrieDateField.formatExternal(new Date(Long.parseLong(value))), true);
writeTag(writer, "date", name, TrieDateField.formatExternal(new Date(Long.parseLong(value))), true);
} else if (typeName.equals(SolrType.num_float.printName())) {
writeTag(writer, "float", name, value, true);
} else if (typeName.equals(SolrType.num_double.printName())) {
@ -277,6 +279,7 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
}
}
@SuppressWarnings({ "static-access", "deprecation" })
private static void writeField(final Writer writer, final String name, final Object value) throws IOException {
if (value instanceof String) {
writeTag(writer, "str", name, (String) value, true);
@ -287,7 +290,7 @@ public class EnhancedXMLResponseWriter implements QueryResponseWriter {
} else if (value instanceof Long) {
writeTag(writer, "long", name, ((Long) value).toString(), true);
} else if (value instanceof Date) {
writeTag(writer, "date", name, org.apache.solr.schema.TrieDateField.formatExternal((Date) value), true);
writeTag(writer, "date", name, TrieDateField.formatExternal((Date) value), true);
} else if (value instanceof Float) {
writeTag(writer, "float", name, ((Float) value).toString(), true);
} else if (value instanceof Double) {

@ -45,6 +45,7 @@ import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.TextField;
import org.apache.solr.schema.TrieDateField;
import org.apache.solr.search.DocIterator;
import org.apache.solr.search.DocList;
import org.apache.solr.search.SolrIndexSearcher;
@ -217,12 +218,13 @@ public class HTMLResponseWriter implements QueryResponseWriter {
return kv;
}
@SuppressWarnings({ "static-access", "deprecation" })
private static String field2string(final FieldType type, final String value) {
String typeName = type.getTypeName();
if (typeName.equals(SolrType.bool.printName())) {
return "F".equals(value) ? "false" : "true";
} else if (typeName.equals(SolrType.date.printName())) {
return org.apache.solr.schema.TrieDateField.formatExternal(new Date(Long.parseLong(value)));
return TrieDateField.formatExternal(new Date(Long.parseLong(value)));
}
return value;
}

@ -26,6 +26,7 @@ package net.yacy.cora.sorting;
import java.util.Comparator;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Random;
import java.util.SortedMap;
@ -40,8 +41,12 @@ public final class ClusteredScoreMap<E> extends AbstractScoreMap<E> implements R
private long gcount;
private int encnt;
public ClusteredScoreMap() {
this.map = new TreeMap<E, Long>();
/**
* create a sorted map where there is a choice between a hash map or a tree map for the key store
* @param sortedKeys if true, a tree map is used for key storage; in this case the iterator() returns a sorted list of keys; if sortedKey is set to false, a linked hash map is used which preserves the original key appearance order
*/
public ClusteredScoreMap(boolean sortedKeys) {
this.map = sortedKeys ? new TreeMap<E, Long>() : new LinkedHashMap<E, Long>();
this.pam = new TreeMap<Long, E>();
this.gcount = 0;
this.encnt = 0;
@ -345,7 +350,7 @@ public final class ClusteredScoreMap<E> extends AbstractScoreMap<E> implements R
public static void main(final String[] args) {
System.out.println("Test for Score: start");
final ClusteredScoreMap<String> s = new ClusteredScoreMap<String>();
final ClusteredScoreMap<String> s = new ClusteredScoreMap<String>(false);
long c = 0;
// create cluster

@ -25,6 +25,7 @@ import java.io.IOException;
import java.lang.reflect.Array;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.ConcurrentModificationException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
@ -347,10 +348,15 @@ public class HostQueue implements Balancer {
@Override
public boolean has(final byte[] urlhashb) {
for (int retry = 0; retry < 3; retry++) {
try {
for (Index depthStack: this.depthStacks.values()) {
if (depthStack.has(urlhashb)) return true;
}
return false;
} catch (ConcurrentModificationException e) {}
}
return false;
}
@Override

@ -75,7 +75,7 @@ public final class ResultURLs {
static {
for (final EventOrigin origin: EventOrigin.values()) {
resultStacks.put(origin, new LinkedHashMap<String, InitExecEntry>());
resultDomains.put(origin, new ClusteredScoreMap<String>());
resultDomains.put(origin, new ClusteredScoreMap<String>(true));
}
}

@ -1,7 +1,6 @@
package net.yacy.data;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;

@ -361,11 +361,11 @@ public final class Condenser {
//Collection<Tagging> vocabularies = LibraryProvider.autotagging.getVocabularies();
//assert vocabularyNames.size() == vocabularies.size();
Map<String, String> vocMap = scraper == null ? null : scraper.removeVocMap(root);
if (vocMap != null) {
if (vocMap != null && vocMap.size() > 0) {
for (Map.Entry<String, String> entry: vocMap.entrySet()) {
String navigatorName = entry.getKey();
String term = entry.getValue();
vocabularyNames.remove(navigatorName);
vocabularyNames.remove(navigatorName); // prevent that this is used again for auto-annotation
Tagging vocabulary = LibraryProvider.autotagging.getVocabulary(navigatorName);
if (vocabulary != null) {
// extend the vocabulary

@ -500,7 +500,7 @@ public class DateDetection {
public static Date parseLine(String text) {
Date d = null;
try {d = CONFORM.parse(text);} catch (ParseException e) {}
if (d == null) try {d = GenericFormatter.FORMAT_SHORT_DAY.parse(text);} catch (ParseException e) {}
//if (d == null) try {d = GenericFormatter.FORMAT_SHORT_DAY.parse(text);} catch (ParseException e) {} // did not work well and fired for wrong formats; do not use
if (d == null) try {d = GenericFormatter.FORMAT_RFC1123_SHORT.parse(text);} catch (ParseException e) {}
if (d == null) try {d = GenericFormatter.FORMAT_ANSIC.parse(text);} catch (ParseException e) {}

@ -235,9 +235,9 @@ public class ContentScraper extends AbstractScraper implements Scraper {
this.titles = new LinkedHashSet<String>();
this.headlines = (List<String>[]) Array.newInstance(ArrayList.class, 6);
for (int i = 0; i < this.headlines.length; i++) this.headlines[i] = new ArrayList<String>();
this.bold = new ClusteredScoreMap<String>();
this.italic = new ClusteredScoreMap<String>();
this.underline = new ClusteredScoreMap<String>();
this.bold = new ClusteredScoreMap<String>(false);
this.italic = new ClusteredScoreMap<String>(false);
this.underline = new ClusteredScoreMap<String>(false);
this.li = new ArrayList<String>();
this.content = new CharBuffer(MAX_DOCSIZE, 1024);
this.htmlFilterEventListeners = new EventListenerList();

@ -154,7 +154,7 @@ public class Evaluation {
* @return a list of subject names that match with the element
*/
public ClusteredScoreMap<String> match(final Element element, final CharSequence content) {
final ClusteredScoreMap<String> subjects = new ClusteredScoreMap<String>();
final ClusteredScoreMap<String> subjects = new ClusteredScoreMap<String>(false);
final List<Attribute> patterns = this.elementMatcher.get(element);
if (patterns == null) return subjects;
for (final Attribute attribute: patterns) {
@ -224,7 +224,7 @@ public class Evaluation {
newScores = pattern.match(element, content);
oldScores = getScores(pattern.getName());
if (oldScores == null) {
oldScores = new ClusteredScoreMap<String>();
oldScores = new ClusteredScoreMap<String>(false);
this.modelMap.put(pattern.getName(), oldScores);
}
oldScores.inc(newScores);

@ -1344,7 +1344,7 @@ public class Seed implements Cloneable, Comparable<Seed>, Comparator<Seed>
}
public static void main(final String[] args) {
final ScoreMap<Integer> s = new ClusteredScoreMap<Integer>();
final ScoreMap<Integer> s = new ClusteredScoreMap<Integer>(true);
for ( int i = 0; i < 10000; i++ ) {
final byte[] b = randomHash();
s.inc(0xff & Base64Order.enhancedCoder.decodeByte(b[0]));

@ -2753,7 +2753,8 @@ public final class Switchboard extends serverSwitch {
new Condenser(
in.documents[i], in.queueEntry.profile().scraper(), in.queueEntry.profile().indexText(),
in.queueEntry.profile().indexMedia(),
LibraryProvider.dymLib, true, this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_sxt));
LibraryProvider.dymLib, true,
this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_dts));
// update image result list statistics
// its good to do this concurrently here, because it needs a DNS lookup
@ -3191,7 +3192,7 @@ public final class Switchboard extends serverSwitch {
}
final Condenser condenser = new Condenser(
document, null, true, true, LibraryProvider.dymLib, true,
Switchboard.this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_sxt));
Switchboard.this.index.fulltext().getDefaultConfiguration().contains(CollectionSchema.dates_in_content_dts));
ResultImages.registerImages(url, document, true);
Switchboard.this.webStructure.generateCitationReference(url, document);
storeDocumentIndex(

@ -26,6 +26,7 @@ import java.util.Date;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.MultiMapSolrParams;
import org.apache.solr.schema.TrieDateField;
import net.yacy.cora.document.id.DigestURL;
import net.yacy.cora.util.CommonPattern;
@ -39,7 +40,7 @@ import net.yacy.server.serverObjects;
public class QueryModifier {
private final StringBuilder modifier;
public String sitehost, sitehash, filetype, protocol, language, author, collection, on;
public String sitehost, sitehash, filetype, protocol, language, author, collection, on, from, to;
public QueryModifier() {
this.sitehash = null;
@ -50,6 +51,8 @@ public class QueryModifier {
this.author = null;
this.collection = null;
this.on = null;
this.from = null;
this.to = null;
this.modifier = new StringBuilder(20);
}
@ -90,7 +93,7 @@ public class QueryModifier {
// parse site
final int sp = querystring.indexOf("site:", 0);
if ( sp >= 0 ) {
if (sp >= 0) {
int ftb = querystring.indexOf(' ', sp);
if ( ftb == -1 ) {
ftb = querystring.length();
@ -114,13 +117,12 @@ public class QueryModifier {
// parse author
final int authori = querystring.indexOf("author:", 0);
if ( authori >= 0 ) {
if (authori >= 0) {
// check if the author was given with single quotes or without
final boolean quotes = (querystring.charAt(authori + 7) == '(');
if ( quotes ) {
int ftb = querystring.indexOf(')', authori + 8);
if (ftb == -1) ftb = querystring.length() + 1;
this.author = querystring.substring(authori + 8, ftb);
this.author = querystring.substring(authori + 8, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("author:(" + this.author + ")", "");
add("author:(" + author + ")");
} else {
@ -129,35 +131,47 @@ public class QueryModifier {
ftb = querystring.length();
}
this.author = querystring.substring(authori + 7, ftb);
querystring = querystring.replace("author:" + this.author, "");
querystring = querystring.replace("author:" + this.author, "").replace(" ", " ").trim();
add("author:" + author);
}
}
// parse collection
final int collectioni = querystring.indexOf("collection:", 0);
if ( collectioni >= 0 ) {
if (collectioni >= 0) {
int ftb = querystring.indexOf(' ', collectioni);
if ( ftb == -1 ) {
ftb = querystring.length();
}
this.collection = querystring.substring(collectioni + 11, ftb);
querystring = querystring.replace("collection:" + this.collection, "");
this.collection = querystring.substring(collectioni + 11, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("collection:" + this.collection, "").replace(" ", " ").trim();
add("collection:" + this.collection);
}
// parse on-date
final int oni = querystring.indexOf("on:", 0);
if ( oni >= 0 ) {
if (oni >= 0) {
int ftb = querystring.indexOf(' ', oni);
if ( ftb == -1 ) {
ftb = querystring.length();
}
this.on = querystring.substring(oni + 3, ftb);
querystring = querystring.replace("on:" + this.on, "");
this.on = querystring.substring(oni + 3, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("on:" + this.on, "").replace(" ", " ").trim();
add("on:" + this.on);
}
// parse from-date
final int fromi = querystring.indexOf("from:", 0);
if (fromi >= 0) {
int ftb = querystring.indexOf(' ', fromi);
this.from = querystring.substring(fromi + 5, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("from:" + this.from, "").replace(" ", " ").trim();
add("from:" + this.from);
}
// parse to-date
final int toi = querystring.indexOf("to:", 0);
if (toi >= 0) {
int ftb = querystring.indexOf(' ', toi);
this.to = querystring.substring(toi + 3, ftb == -1 ? querystring.length() : ftb);
querystring = querystring.replace("to:" + this.to, "").replace(" ", " ").trim();
add("to:" + this.to);
}
// parse language
final int langi = querystring.indexOf("/language/");
if (langi >= 0) {
@ -255,10 +269,24 @@ public class QueryModifier {
fq.append(" AND ").append(QueryModifier.parseCollectionExpression(this.collection));
}
if (this.on != null && this.on.length() > 0 && fq.indexOf(CollectionSchema.dates_in_content_sxt.getSolrFieldName()) < 0) {
if (fq.indexOf(CollectionSchema.dates_in_content_dts.getSolrFieldName()) < 0) {
if (this.on != null && this.on.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseOnExpression(this.on));
}
if (this.from != null && this.from.length() > 0 && (this.to == null || this.to.equals("*"))) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.from, null));
}
if ((this.from == null || this.from.equals("*")) && this.to != null && this.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(null, this.to));
}
if (this.from != null && this.from.length() > 0 && this.to != null && this.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.from, this.to));
}
}
if (this.protocol != null && this.protocol.length() > 0 && fq.indexOf(CollectionSchema.url_protocol_s.getSolrFieldName()) < 0) {
fq.append(" AND ").append(CollectionSchema.url_protocol_s.getSolrFieldName()).append(":\"").append(this.protocol).append('\"');
}
@ -317,13 +345,29 @@ public class QueryModifier {
}
public static String parseOnExpression(String onDescription) {
assert onDescription != null;
Date onDate = DateDetection.parseLine(onDescription);
StringBuilder filterQuery = new StringBuilder(20);
if (onDate != null) {
filterQuery.append(CollectionSchema.dates_in_content_sxt.getSolrFieldName()).append(":\"").append(org.apache.solr.schema.TrieDateField.formatExternal(onDate)).append('\"');
@SuppressWarnings({ "deprecation", "static-access" })
String dstr = TrieDateField.formatExternal(onDate);
filterQuery.append(CollectionSchema.dates_in_content_dts.getSolrFieldName()).append(":[").append(dstr).append(" TO ").append(dstr).append(']');
}
return filterQuery.toString();
}
public static String parseFromToExpression(String from, String to) {
Date fromDate = from == null || from.equals("*") ? null : DateDetection.parseLine(from);
Date toDate = to == null || to.equals("*") ? null : DateDetection.parseLine(to);
StringBuilder filterQuery = new StringBuilder(20);
if (fromDate != null && toDate != null) {
@SuppressWarnings({ "deprecation", "static-access" })
String dstrFrom = fromDate == null ? "*" : TrieDateField.formatExternal(fromDate);
@SuppressWarnings({ "deprecation", "static-access" })
String dstrTo = toDate == null ? "*" : TrieDateField.formatExternal(toDate);
filterQuery.append(CollectionSchema.dates_in_content_dts.getSolrFieldName()).append(":[").append(dstrFrom).append(" TO ").append(dstrTo).append(']');
}
return filterQuery.toString();
}
}

@ -27,6 +27,7 @@
package net.yacy.search.query;
import java.util.Collection;
import java.util.Date;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedHashSet;
@ -70,9 +71,13 @@ import org.apache.solr.client.solrj.SolrQuery.SortClause;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.DisMaxParams;
import org.apache.solr.common.params.FacetParams;
import org.apache.solr.schema.TrieDateField;
public final class QueryParams {
public static int FACETS_STANDARD_MAXCOUNT = 30;
public static int FACETS_DATE_MAXCOUNT = 730;
public enum Searchdom {
LOCAL, CLUSTER, GLOBAL;
@ -92,13 +97,13 @@ public final class QueryParams {
defaultfacetfields.put("hosts", CollectionSchema.host_s);
defaultfacetfields.put("protocol", CollectionSchema.url_protocol_s);
defaultfacetfields.put("filetype", CollectionSchema.url_file_ext_s);
defaultfacetfields.put("date", CollectionSchema.dates_in_content_dts);
defaultfacetfields.put("authors", CollectionSchema.author_sxt);
defaultfacetfields.put("collections", CollectionSchema.collection_sxt);
defaultfacetfields.put("language", CollectionSchema.language_s);
//missing: namespace
}
private static final int defaultmaxfacets = 30;
public static final Bitfield empty_constraint = new Bitfield(4, "AAAAAA");
public static final Pattern catchall_pattern = Pattern.compile(".*");
private static final Pattern matchnothing_pattern = Pattern.compile("");
@ -137,7 +142,6 @@ public final class QueryParams {
protected boolean filterfailurls, filterscannerfail;
protected double lat, lon, radius;
public LinkedHashSet<String> facetfields;
public int maxfacets;
private SolrQuery cachedQuery;
private CollectionConfiguration solrSchema;
@ -252,7 +256,6 @@ public final class QueryParams {
this.facetfields.add(CollectionSchema.VOCABULARY_PREFIX + v.getName() + CollectionSchema.VOCABULARY_TERMS_SUFFIX);
}
}
this.maxfacets = defaultmaxfacets;
this.cachedQuery = null;
}
@ -443,10 +446,24 @@ public final class QueryParams {
if (getFacets && this.facetfields.size() > 0) {
params.setFacet(true);
params.setFacetMinCount(1);
params.setFacetLimit(this.maxfacets);
params.setFacetLimit(FACETS_STANDARD_MAXCOUNT);
params.setFacetSort(FacetParams.FACET_SORT_COUNT);
params.setParam(FacetParams.FACET_METHOD, FacetParams.FACET_METHOD_fcs);
for (String field: this.facetfields) params.addFacetField("{!ex=" + field + "}" + field);
for (String field: this.facetfields) params.addFacetField(field); // params.addFacetField("{!ex=" + field + "}" + field);
if (this.facetfields.contains(CollectionSchema.dates_in_content_dts.name())) {
params.setParam("facet.range", CollectionSchema.dates_in_content_dts.name());
@SuppressWarnings({ "static-access", "deprecation" })
String start = TrieDateField.formatExternal(new Date(System.currentTimeMillis() - 1000L * 60L * 60L * 24L * 3));
@SuppressWarnings({ "static-access", "deprecation" })
String end = TrieDateField.formatExternal(new Date(System.currentTimeMillis() + 1000L * 60L * 60L * 24L * 3));
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.range.start", start);
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.range.end", end);
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.range.gap", "+1DAY");
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.sort", "index");
params.setParam("f." + CollectionSchema.dates_in_content_dts.getSolrFieldName() + ".facet.limit", Integer.toString(FACETS_DATE_MAXCOUNT)); // the year constraint should cause that limitation already
}
//for (String k: params.getParameterNames()) {ArrayList<String> al = new ArrayList<>(); for (String s: params.getParams(k)) al.add(s); System.out.println("Parameter: " + k + "=" + al.toString());}
//http://localhost:8090/solr/collection1/select?q=*:*&rows=0&facet=true&facet.field=dates_in_content_dts&f.dates_in_content_dts.facet.limit=730&f.dates_in_content_dts.facet.sort=index
} else {
params.setFacet(false);
}
@ -454,6 +471,8 @@ public final class QueryParams {
return params;
}
long year = 1000L * 60L * 60L * 24L * 365L;
private String getFacets() {
// add site facets
@ -500,10 +519,24 @@ public final class QueryParams {
fq.append(" AND ").append(QueryModifier.parseCollectionExpression(this.modifier.collection));
}
if (this.modifier.on != null && this.modifier.on.length() > 0 && this.solrSchema.contains(CollectionSchema.dates_in_content_sxt)) {
if (this.solrSchema.contains(CollectionSchema.dates_in_content_dts)) {
if (this.modifier.on != null && this.modifier.on.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseOnExpression(this.modifier.on));
}
if (this.modifier.from != null && this.modifier.from.length() > 0 && (this.modifier.to == null || this.modifier.to.equals("*"))) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.modifier.from, null));
}
if ((this.modifier.from == null || this.modifier.from.equals("*")) && this.modifier.to != null && this.modifier.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(null, this.modifier.to));
}
if (this.modifier.from != null && this.modifier.from.length() > 0 && this.modifier.to != null && this.modifier.to.length() > 0) {
fq.append(" AND ").append(QueryModifier.parseFromToExpression(this.modifier.from, this.modifier.to));
}
}
if (this.modifier.protocol != null) {
fq.append(" AND {!tag=").append(CollectionSchema.url_protocol_s.getSolrFieldName()).append("}").append(CollectionSchema.url_protocol_s.getSolrFieldName()).append(':').append(this.modifier.protocol);
}

@ -58,6 +58,7 @@ import net.yacy.cora.lod.vocabulary.Tagging;
import net.yacy.cora.order.Base64Order;
import net.yacy.cora.protocol.Domains;
import net.yacy.cora.protocol.Scanner;
import net.yacy.cora.sorting.ClusteredScoreMap;
import net.yacy.cora.sorting.ConcurrentScoreMap;
import net.yacy.cora.sorting.ReversibleScoreMap;
import net.yacy.cora.sorting.ScoreMap;
@ -148,6 +149,7 @@ public final class SearchEvent {
public final ScoreMap<String> namespaceNavigator; // a counter for name spaces
public final ScoreMap<String> protocolNavigator; // a counter for protocol types
public final ScoreMap<String> filetypeNavigator; // a counter for file types
public final ScoreMap<String> dateNavigator; // a counter for file types
public final ScoreMap<String> languageNavigator; // a counter for appearance of languages
public final Map<String, ScoreMap<String>> vocabularyNavigator; // counters for Vocabularies; key is metatag.getVocabularyName()
private final int topicNavigatorCount; // if 0 no topicNavigator, holds expected number of terms for the topicNavigator
@ -243,6 +245,7 @@ public final class SearchEvent {
this.hostNavigator = navcfg.contains("hosts") ? new ConcurrentScoreMap<String>() : null;
this.protocolNavigator = navcfg.contains("protocol") ? new ConcurrentScoreMap<String>() : null;
this.filetypeNavigator = navcfg.contains("filetype") ? new ConcurrentScoreMap<String>() : null;
this.dateNavigator = navcfg.contains("date") ? new ClusteredScoreMap<String>(true) : null;
this.topicNavigatorCount = navcfg.contains("topics") ? MAX_TOPWORDS : 0;
this.languageNavigator = navcfg.contains("language") ? new ConcurrentScoreMap<String>() : null;
this.vocabularyNavigator = new TreeMap<String, ScoreMap<String>>();
@ -836,6 +839,11 @@ public final class SearchEvent {
}
}
if (this.dateNavigator != null) {
fcts = facets.get(CollectionSchema.dates_in_content_dts.getSolrFieldName());
if (fcts != null) this.dateNavigator.inc(fcts);
}
if (this.languageNavigator != null) {
fcts = facets.get(CollectionSchema.language_s.getSolrFieldName());
if (fcts != null) {

@ -42,7 +42,6 @@ import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeMap;
import java.util.TreeSet;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger;
@ -496,30 +495,13 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
if (firstSeen > 0 && firstSeen < lastModified.getTime()) lastModified = new Date(firstSeen); // patch the date if we have seen the document earlier
add(doc, CollectionSchema.last_modified, lastModified);
}
if (allAttr ||
contains(CollectionSchema.dates_in_content_sxt) || contains(CollectionSchema.dates_in_content_count_i) ||
contains(CollectionSchema.date_in_content_min_dt) || contains(CollectionSchema.date_in_content_max_dt)) {
if (allAttr || contains(CollectionSchema.dates_in_content_dts) || contains(CollectionSchema.dates_in_content_count_i)) {
LinkedHashSet<Date> dates_in_content = condenser.dates_in_content;
if (allAttr || contains(CollectionSchema.dates_in_content_count_i)) {
add(doc, CollectionSchema.dates_in_content_count_i, dates_in_content.size());
}
if (dates_in_content.size() > 0) {
if (allAttr || contains(CollectionSchema.dates_in_content_sxt)) {
String[] dates = new String[dates_in_content.size()];
int i = 0; for (Date d: dates_in_content) dates[i++] = org.apache.solr.schema.TrieDateField.formatExternal(d);
add(doc, CollectionSchema.dates_in_content_sxt, dates);
}
// order the dates to get the oldest and youngest
TreeSet<Date> ordered_dates = new TreeSet<>();
ordered_dates.addAll(dates_in_content);
if (allAttr || contains(CollectionSchema.date_in_content_min_dt)) {
Date date_in_content_min_dt = ordered_dates.iterator().next();
add(doc, CollectionSchema.date_in_content_min_dt, date_in_content_min_dt);
}
if (allAttr || contains(CollectionSchema.date_in_content_max_dt)) {
Date date_in_content_max_dt = ordered_dates.descendingIterator().next();
add(doc, CollectionSchema.date_in_content_max_dt, date_in_content_max_dt);
}
if (dates_in_content.size() > 0 && (allAttr || contains(CollectionSchema.dates_in_content_dts))) {
add(doc, CollectionSchema.dates_in_content_dts, dates_in_content.toArray(new Date[dates_in_content.size()]));
}
}
if (allAttr || contains(CollectionSchema.keywords)) {
@ -1085,7 +1067,7 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
collection1hosts = hostfacet.get(CollectionSchema.host_s.getSolrFieldName());
} catch (final IOException e2) {
ConcurrentLog.logException(e2);
collection1hosts = new ClusteredScoreMap<String>();
collection1hosts = new ClusteredScoreMap<String>(true);
}
postprocessingActivity = "create ranking map";
@ -1173,7 +1155,7 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
if (collection1hosts.size() != countcheck) ConcurrentLog.warn("CollectionConfiguration", "ambiguous host count: expected=" + collection1hosts.size() + ", counted=" + countcheck);
} catch (final IOException e2) {
ConcurrentLog.logException(e2);
collection1hosts = new ClusteredScoreMap<String>();
collection1hosts = new ClusteredScoreMap<String>(true);
}
// process all documents at the webgraph for the outgoing links of this document
@ -1192,7 +1174,7 @@ public class CollectionConfiguration extends SchemaConfiguration implements Seri
webgraphhosts = hostfacet.get(WebgraphSchema.source_host_s.getSolrFieldName());
} catch (final IOException e2) {
ConcurrentLog.logException(e2);
webgraphhosts = new ClusteredScoreMap<String>();
webgraphhosts = new ClusteredScoreMap<String>(true);
}
try {
final long start = System.currentTimeMillis();

@ -35,10 +35,8 @@ public enum CollectionSchema implements SchemaDeclaration {
sku(SolrType.string, true, true, false, true, true, "url of document"), // a 'sku' is a stock-keeping unit, a unique identifier and a default field in unmodified solr.
//sku(SolrType.text_en_splitting_tight, true, true, false, true, true, "url of document"), // a 'sku' is a stock-keeping unit, a unique identifier and a default field in unmodified solr.
last_modified(SolrType.date, true, true, false, false, false, "last-modified from http header"),
dates_in_content_sxt(SolrType.string, true, true, true, false, true, "if date expressions can be found in the content, these dates are listed here in order of the appearances"),
dates_in_content_dts(SolrType.date, true, true, true, false, true, "if date expressions can be found in the content, these dates are listed here as date objects in order of the appearances"),
dates_in_content_count_i(SolrType.num_integer, true, true, false, false, false, "the number of entries in dates_in_content_sxt"),
date_in_content_min_dt(SolrType.date, true, true, false, false, false, "if dates_in_content_sxt is filled, this contains the oldest date from the list of available dates"),
date_in_content_max_dt(SolrType.date, true, true, false, false, false, "if dates_in_content_sxt is filled, this contains the youngest date from the list of available dates, that may also be possibly in the future"),
content_type(SolrType.string, true, true, true, false, false, "mime-type of document"),
http_unique_b(SolrType.bool, true, true, false, false, false, "unique-field which is true when an url appears the first time. If the same url which was http then appears as https (or vice versa) then the field is false"),
www_unique_b(SolrType.bool, true, true, false, false, false, "unique-field which is true when an url appears the first time. If the same url within the subdomain www then appears without that subdomain (or vice versa) then the field is false"),
@ -362,6 +360,12 @@ public enum CollectionSchema implements SchemaDeclaration {
doc.setField(this.getSolrFieldName(), value);
}
@Override
public final void add(final SolrInputDocument doc, final Date[] value) {
assert this.isMultiValued();
doc.setField(this.getSolrFieldName(), value);
}
@Override
public final void add(final SolrInputDocument doc, final String[] value) {
assert this.isMultiValued();

@ -222,6 +222,12 @@ public enum WebgraphSchema implements SchemaDeclaration {
doc.setField(this.getSolrFieldName(), value);
}
@Override
public final void add(final SolrInputDocument doc, final Date[] value) {
assert this.isMultiValued();
doc.setField(this.getSolrFieldName(), value);
}
@Override
public final void add(final SolrInputDocument doc, final String[] value) {
assert this.isMultiValued();

Loading…
Cancel
Save