luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
reger
288acceac3
fix test htmlParserTest, charset parameter
...
+ upd maven templating-plugin version
9 years ago
reger
d2cc11ea8f
fix html parser taking <style> content as text.
...
Noticed some result description contain css content from style tag.
Added <style> to tag list to scrape it's content not as text
+ test case included
10 years ago
reger
71bf95af8a
upd parser calls in test cases
10 years ago
reger
77851fa53c
fix parser test cases
...
(Vocabulary paramete)
10 years ago
reger
e94efd4d7c
update to JUnit 4.11
...
- fix build.xml -> parserTest error on Windows due to javac encoding
11 years ago
reger
0b6db04e40
fix contentscraper img height/width parsing
...
prevent numberformat exception on common "100px" property
- include in test case
11 years ago
reger
86f6975edc
exclude html tags in in/outboundlinks_anchortext_txt parsed text
...
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
11 years ago
reger
71649bf22d
add test case htmlParser.parse - getCharset
...
(which fails)
11 years ago
reger
c8d437b69a
clean up test sources
...
rename to current package names and move to default location
11 years ago