reger
cb95b7339a
include html5 <time> tag in content scraper,
...
add "datetime" property of <time> tag to scrapers startdate list.
Datetime is parsed as iso8601 (xml) date, html5 allows partial as well
as duration (not handled by this)
8 years ago
luccioman
7717a3d43d
Fixed license headers on files created to improve favicon management.
9 years ago
luccioman
6e1959f469
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
...
Conflicts:
htroot/yacysearchitem.java
source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java
source/net/yacy/search/schema/CollectionConfiguration.java
source/net/yacy/server/serverObjects.java
9 years ago
reger
b752bcfecb
adjust date in text detection to ignore some program version strings
...
like "3.1.2.0102" see http://mantis.tokeek.de/view.php?id=650
+ expand test case
9 years ago
reger
b017e97421
optimize condenser language detection a little.
...
langdetect probabilities take letter case into account, add words from
description and anchors etc. as is.
+ add it to javadoc
9 years ago
reger
ae3717d087
adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! )
...
+ remove unused sentenceword map (we use only the count)
+ upd test case for sentence count
9 years ago
reger
1a79c64495
generalize DateDetection with holiday date rules readily available in icu
...
to make sure current dates are recognized (was fixed to 2014 - 2016)
+ adjust holiday date parser from pattern.match to pattern.find to deal with leading and trailing text
+ moved relative date recognition (morgen, tomorrow) to parseline (used by query parser only), as not working and problematic for indexing
+ add test case for parseline (used by query parser)
9 years ago
reger
272cdd496a
reactivate sentence counter in WordTokenizer for phrasepos ranking,
...
by counting punktuation (delivered as 1 char word) again.
9 years ago
reger
e310ec5f70
fix posInText ranking calculation to score 0 on no position info
...
+ fix Word posInText calc in Tokenizer to start with 1
+ test case
9 years ago
reger
ebde21079a
refactor xlsParser to include Excel file attribute (like author) in parser result doc.
...
Similar to ppt and doc parser, completing a TODO in xlsParser.
9 years ago
luc
3cc5619d93
Improved HTML icons indexing and rendering in search results.
...
See http://mantis.tokeek.de/view.php?id=629
9 years ago
reger
84c970eaec
move test classes to test/java (subdirectory as in Maven standard subdir layout)
...
because ViewImage*Test.java breaks test run
9 years ago