reger
ea633a794c
including small junit test case for WordTokenizer
10 years ago
reger
aa2e15d846
allow url parameter in worktable apicall
...
allow url=wwwl?param=a¶m=b (with ?, & encoded)
fix: http://mantis.tokeek.de/view.php?id=100
fix double adding of '&' in MultiProtocolURL.escape()
10 years ago
reger
b510b182d8
- update Maven pom
...
- add ppt parser test case
10 years ago
reger
e94efd4d7c
update to JUnit 4.11
...
- fix build.xml -> parserTest error on Windows due to javac encoding
11 years ago
reger
7847a93558
fix AbstractParser.singleList not adding null strings
...
- prevents null titles in oo... parser (as detected by ParserTest)
- correct ParserTest dc_description check (dc_description allowed to return 0 length array)
11 years ago
reger
0b6db04e40
fix contentscraper img height/width parsing
...
prevent numberformat exception on common "100px" property
- include in test case
11 years ago
reger
86f6975edc
exclude html tags in in/outboundlinks_anchortext_txt parsed text
...
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
11 years ago
reger
71649bf22d
add test case htmlParser.parse - getCharset
...
(which fails)
11 years ago
reger
c8d437b69a
clean up test sources
...
rename to current package names and move to default location
11 years ago