luccioman
f1f4459f88
Added some unit tests for Blacklist.isListed()
8 years ago
reger
e68b00678e
prevent negative score on URIMetadataNode - in the special case were no
...
solr score is supplied.
+ assert before use & test case
8 years ago
reger
b752bcfecb
adjust date in text detection to ignore some program version strings
...
like "3.1.2.0102" see http://mantis.tokeek.de/view.php?id=650
+ expand test case
8 years ago
reger
b017e97421
optimize condenser language detection a little.
...
langdetect probabilities take letter case into account, add words from
description and anchors etc. as is.
+ add it to javadoc
8 years ago
reger
ae3717d087
adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! )
...
+ remove unused sentenceword map (we use only the count)
+ upd test case for sentence count
8 years ago
reger
474f0476c6
adjust Tokenizer sentence count on trailing text after last recognized sentence
...
+ upd test case for rwi multi-word-query (leaving results known to fail untested)
8 years ago
reger
1a79c64495
generalize DateDetection with holiday date rules readily available in icu
...
to make sure current dates are recognized (was fixed to 2014 - 2016)
+ adjust holiday date parser from pattern.match to pattern.find to deal with leading and trailing text
+ moved relative date recognition (morgen, tomorrow) to parseline (used by query parser only), as not working and problematic for indexing
+ add test case for parseline (used by query parser)
8 years ago
reger
32a2e3a22a
have RSSFeed.getChannel return empty message on missing channel element,
...
a) required b) prevent NPE in rss servlets
+ add test
8 years ago
luccioman
4585a60d7e
Made use of the constant corresponding to the hard-coded value.
8 years ago
luccioman
1bb0b135ac
Avoid duplication of various MS Windows file URLs flavors
...
Fix for mantis 692 (http://mantis.tokeek.de/view.php?id=692 )
8 years ago
reger
6f8c3ccea4
improve url hash computation for file path with mixed java & windows
...
file.separator to compute equal hashes (by normalizing path for computation)
+ expand test case for to check mixed java / windows file url notation
like e.g. file:///c:/test/file.html vs. file:///c:\test/file.html
- relates partially to http://mantis.tokeek.de/view.php?id=692
8 years ago
reger
330768c8a2
fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
...
The embedded core holds a lock on the index and must be closed. Earlier commit
comment states that core should be closed with solr instance instead on close
of connector.
Adjusted the InstanceMirror.close() to take care of closing the embedded
instance to release the lock.
In 2 routines of fulltext this was already explicite implemented (disconnectLocalSolr).
Now this disconnect is part of the InstanceMirror.close().
8 years ago
reger
11786457b7
add test case for EmeddedSolrConnector close()
...
for issue http://mantis.tokeek.de/view.php?id=686
(without solving the issue here)
8 years ago
reger
585d2a6441
test case: for NewsPool to check the id modificator (for unique id)
...
and observe the distribution order .. hands on.
+ add test/DATA to gitignor
9 years ago
reger
ff6589fc0f
test case: simulating multi word query for local rwi index
...
Purpose of the test case is to be able to (controlled) analyse the rwi ranking for
multi word searches (with focus on posintext and word-distance ranking)
9 years ago
reger
7f63fc50f3
prepare a IndexSegment test case for RWI index testing
...
+ prevent NPE in Segment.clear() on missing embedded solr instance.
9 years ago
reger
272cdd496a
reactivate sentence counter in WordTokenizer for phrasepos ranking,
...
by counting punktuation (delivered as 1 char word) again.
9 years ago
Michael Peter Christen
5e165a8150
removed unused imports
9 years ago
reger
e310ec5f70
fix posInText ranking calculation to score 0 on no position info
...
+ fix Word posInText calc in Tokenizer to start with 1
+ test case
9 years ago
reger
39dd244693
fix ConcurrentScoreMap.set() calculation of totalCount()
...
+ test case
9 years ago
reger
ebde21079a
refactor xlsParser to include Excel file attribute (like author) in parser result doc.
...
Similar to ppt and doc parser, completing a TODO in xlsParser.
9 years ago
reger
5e335b32da
fix Blacklist.contains() matching path pattern to string
...
similar to 5e9e871192
+ add proof testcase
9 years ago
reger
f89d4eb51d
fix MultiProtocolURL init (assign of host) for urls with '/' in query part
...
+ add to test case
9 years ago
reger
87fcfc6d78
Adjusted hash computation and toNormalform for file:// protocol to deliver
...
same hash same file on Windows filesystem path with forward- and backslash in path.
Background see http://mantis.tokeek.de/view.php?id=671
+Test case
9 years ago
reger
7b226afc33
fix HostQueueTest - changed open parameter
9 years ago
luccioman
893a40995a
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
9 years ago
reger
fcc29c36f0
test case for HostBalancer issue in intranet mode
...
with file:// protocol, 2 hostqueues accessing same cache file concurrently
http://mantis.tokeek.de/view.php?id=668
Reason seems to be diff. hosthash key of hostqueues on reopen.
Internal queue key and external representation (directoryname currently hostname.port) must be adjusted to fix it (not done yet).
9 years ago
luccioman
6e96c7341a
Merge remote-tracking branch 'origin/master'
...
Conflicts:
htroot/Load_MediawikiWiki.java
htroot/Load_PHPBB3.java
htroot/ViewImage.java
9 years ago
reger
a476d06aec
wiki header code test string add "closing" tag
9 years ago
reger
d4da4805a8
internal wiki code, require header line to start with markup
...
(to allow something like "one=two" as text)
+ incl. test case
9 years ago
reger
223071337b
Translator to take caution of word boundaries to identify text portion to
...
be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it"
translation of partial terms/words.
Add check of word boundary before and after sourcetext (incl. take care
of current praxis for key to be delimetered by > <
+ add test case
9 years ago
reger
a6ba1faa80
introduce a translation edit servlet Translator_p.html YaCy's UI text translation
...
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
this includes storing manually downloaded translation files in DATA as well
(to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
9 years ago
reger
b74cddc49c
upd to Jetty v9.2.16.v20160414
...
- exclude unused mime4j
- remove unused yacy-cora build
9 years ago
reger
24b0fa2a38
extend snapshot Html2Image.pdf2image to use PDFBox image export capability
...
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
9 years ago
reger
902e79e261
Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
...
This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649
Allows longer term also to store translation maps for the htroot files
in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ).
+ added test case creating and comparing xliff file with internal custom prop file.
(currently the introduced class is not used in core code)
9 years ago
reger
ec24a0c85a
add test case for optimized toTokens()
9 years ago
luc
26f1ead57c
Created ViewFavicon class specialized in favicon viewing.
...
Main image processing is now in ImageViewer, used by both ViewImage and
ViewFavicon.
Fixed URIMetadataNode.getFavicon to use non-standard icons with no size
ass fallback.
9 years ago
luc
07222b3e1a
Added favicon url transmission in RWI chunks.
9 years ago
luc
53781299d8
Extracted intranet and filtype related rules from getFaviconURL func
9 years ago
luc
3cc5619d93
Improved HTML icons indexing and rendering in search results.
...
See http://mantis.tokeek.de/view.php?id=629
9 years ago
luc
ef83e34b8a
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
84c970eaec
move test classes to test/java (subdirectory as in Maven standard subdir layout)
...
because ViewImage*Test.java breaks test run
9 years ago
luc
cfdbc2b487
Improved URLLicence reliability for use by conccurrent non authaurized
...
users.
Removed URLLicence generation when unnecessary (authorized users)
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
reger
1af0e9ef74
remove workaround for Solr bug regarding multivalued date fields
...
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
reger
4d2b934487
prevent mailto links getting into parser result document's in/outbound link collection
...
by checking mailto scheme early.
- fix upper case mailto protocol assignment
- add test case for getProtocol
9 years ago
reger
288acceac3
fix test htmlParserTest, charset parameter
...
+ upd maven templating-plugin version
9 years ago
luc
f01d49c37a
Process large or local file images dealing directly with content
...
InputStream.
9 years ago
luc
0de6988604
Added links to more image test suites.
9 years ago
luc
745e97a575
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
2895ab552a
Made ViewImagePerfTest extend ViewImageTest to ease automated image
...
render tests
9 years ago
luc
4a03cf06e1
Corrected encoding extension arg parsing
9 years ago
reger
d223cf0ae4
adjust MediaWiki importer geo coordinate calculation
...
- allow lat/long 0.xxx
- south / west assignment
include test class
9 years ago
luc
8da20718aa
Created a class to test ViewImage rendering against multiple image
...
files.
9 years ago
luc
ec04d27473
Corrected APNG test suite link name.
9 years ago
luc
cbb84ba073
Detailed javadoc.
9 years ago
luc
70111876d2
Filled ViewImageTest.html with all remaining IANA image file formats.
...
Added some links to test suites and specifications.
9 years ago
luc
e093fb228d
Created a generic ViewImage performance render test.
9 years ago
luc
3ad564e2e4
Created a ViewImage rendering performance measurement test.
9 years ago
luc
b3f044072e
Updated table headers and SVG file url for case sensitive OS.
9 years ago
luc
f5746b5490
Added ico and bmp sample pictures
9 years ago
luc
baede48161
Added JPEG 2000 and FITS samples
9 years ago
luc
7c9d80c5d0
Added image formats and informations for each format.
9 years ago
luc
0ae9297ca5
Created a html test page to check ViewImage rendering with different
...
file formats.
9 years ago
reger
bad34804fe
optimize parseInt for <img> tag attribute parsing
...
Performance better as using Numberformat.parse or parseInt(substring())
10 years ago
reger
d2cc11ea8f
fix html parser taking <style> content as text.
...
Noticed some result description contain css content from style tag.
Added <style> to tag list to scrape it's content not as text
+ test case included
10 years ago
reger
e594130aec
add test case for partial update - to discover effect on YaCy for update of documents with multivalued date fields (like dates_in_content_dts)
...
current result: loss of fields/information in index document, see EmbeddedSolrConnectorTest.testUdate_withMultivaluedDateField()
10 years ago
reger
d5da9e5a38
fix test methode (add throw for URIMetadataNode)
10 years ago
reger
4cf875336c
complete TODO: getFileExtension handle dot in query part
...
+ testcase
10 years ago
reger
c37dda8849
fix NPE on MultiProtocolURL on url with parameter value and '='
...
in getAttribute
- added test case for it
10 years ago
reger
71bf95af8a
upd parser calls in test cases
10 years ago
reger
f63fff9008
fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
...
to keep it as one word (by altering the split regex)
- added sniipet test case with number
- regex for word split to match multiple splitcars
10 years ago
reger
2ef8ffdb60
apply UTF-8 encoding
...
copied from escape()
10 years ago
reger
7120ea42f1
fix for path with char code > 255
...
(causing index out of bound exception)
+ test cas for it
10 years ago
reger
1d81bd0687
fix url encoding for path see http://mantis.tokeek.de/view.php?id=559
...
So far we used same escape procedure for all parts of the url (which includes x-www-form-urlencoded for all url components)
Added capability to use different encoding rules for the different url components (through specific bitset for each component).
(this is inspired by org.apache.http.client and java.net.uri implementation).
- Added test case for http://mantis.tokeek.de/view.php?id=559
10 years ago
reger
f94e34058c
fix url (path) %-decoding http://mantis.tokeek.de/view.php?id=519
...
- add test case for this
10 years ago
reger
16bc267a32
add test case for snippet html encoding check
10 years ago
reger
77851fa53c
fix parser test cases
...
(Vocabulary paramete)
10 years ago
reger
df83fcc4fc
disable optimistic GC assumption in StandardMemoryStrategy
...
After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC.
Disabeling this check improved eom exceptions.
Added simplest testcase used for verification
10 years ago
Michael Peter Christen
68c605d637
replace with CommonPattern.SPACE for split
10 years ago
reger
9edc7308aa
update to metadata-extractor-2.7.0.jar
...
add 2 simple JUnit test cases for jpeg and tif parsing
10 years ago
reger
5d67e165d9
remove redundant null check in ResponseHeader.lastModified
...
added a JUnit testcase for ResponseHeader dates (using age()),
adjusted age() to pass all tests
10 years ago
reger
ea633a794c
including small junit test case for WordTokenizer
10 years ago
reger
aa2e15d846
allow url parameter in worktable apicall
...
allow url=wwwl?param=a¶m=b (with ?, & encoded)
fix: http://mantis.tokeek.de/view.php?id=100
fix double adding of '&' in MultiProtocolURL.escape()
10 years ago
reger
e88537522d
allow single quote " ' " in query
...
see http://mantis.tokeek.de/view.php?id=379
-add QueryGoal test case for this
11 years ago
reger
e50b2b4d04
fix test case MultiProtocolURL.toString()
...
(only allowed on AnchorURL)
11 years ago
reger
b510b182d8
- update Maven pom
...
- add ppt parser test case
11 years ago
Michael Peter Christen
2de159719b
added an option to set 'obey nofollow' for links with rel="nofollow"
...
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
11 years ago
reger
1f2eba977d
add test case for Records (used in HostBalancer)
...
- simulating seek error (http://mantis.tokeek.de/view.php?id=411 )
11 years ago
reger
e94efd4d7c
update to JUnit 4.11
...
- fix build.xml -> parserTest error on Windows due to javac encoding
11 years ago
reger
3b77e41f1a
adding test for HostQueue crawl stack
...
- simulating problem with zero length stack file (but not fixing it)
- adding test data clean to maven pom
11 years ago
reger
431a5f9c4e
added test case for TextSnippet,
...
removed obsolete/unused parameter and reference to MediaSnippet
11 years ago
reger
7847a93558
fix AbstractParser.singleList not adding null strings
...
- prevents null titles in oo... parser (as detected by ParserTest)
- correct ParserTest dc_description check (dc_description allowed to return 0 length array)
11 years ago
reger
0b6db04e40
fix contentscraper img height/width parsing
...
prevent numberformat exception on common "100px" property
- include in test case
11 years ago
reger
bb8181b2be
fix: resolve url without path but searchpart
...
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47
added test case for getHost
11 years ago
reger
86f6975edc
exclude html tags in in/outboundlinks_anchortext_txt parsed text
...
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
11 years ago
reger
71649bf22d
add test case htmlParser.parse - getCharset
...
(which fails)
11 years ago
reger
6878c90f99
fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378 )
...
requiring following ":" for fc and fd prefix and made pattern match case insesitive
- add some more ipv6 test cases to MultiProtocolURLTest.java
11 years ago
reger
c8d437b69a
clean up test sources
...
rename to current package names and move to default location
11 years ago
reger
18a56446ce
reorg URL test classes add isLocal test with some IPv6 examples
...
- putting in default location and clean old package names
- add some valid RFC IPv6 sample urls (which don't pass the isLocal test)
11 years ago