reger
bad34804fe
optimize parseInt for <img> tag attribute parsing
...
Performance better as using Numberformat.parse or parseInt(substring())
9 years ago
reger
d2cc11ea8f
fix html parser taking <style> content as text.
...
Noticed some result description contain css content from style tag.
Added <style> to tag list to scrape it's content not as text
+ test case included
9 years ago
reger
e594130aec
add test case for partial update - to discover effect on YaCy for update of documents with multivalued date fields (like dates_in_content_dts)
...
current result: loss of fields/information in index document, see EmbeddedSolrConnectorTest.testUdate_withMultivaluedDateField()
9 years ago
reger
d5da9e5a38
fix test methode (add throw for URIMetadataNode)
9 years ago
reger
4cf875336c
complete TODO: getFileExtension handle dot in query part
...
+ testcase
9 years ago
reger
c37dda8849
fix NPE on MultiProtocolURL on url with parameter value and '='
...
in getAttribute
- added test case for it
10 years ago
reger
71bf95af8a
upd parser calls in test cases
10 years ago
reger
f63fff9008
fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
...
to keep it as one word (by altering the split regex)
- added sniipet test case with number
- regex for word split to match multiple splitcars
10 years ago
reger
2ef8ffdb60
apply UTF-8 encoding
...
copied from escape()
10 years ago
reger
7120ea42f1
fix for path with char code > 255
...
(causing index out of bound exception)
+ test cas for it
10 years ago
reger
1d81bd0687
fix url encoding for path see http://mantis.tokeek.de/view.php?id=559
...
So far we used same escape procedure for all parts of the url (which includes x-www-form-urlencoded for all url components)
Added capability to use different encoding rules for the different url components (through specific bitset for each component).
(this is inspired by org.apache.http.client and java.net.uri implementation).
- Added test case for http://mantis.tokeek.de/view.php?id=559
10 years ago
reger
f94e34058c
fix url (path) %-decoding http://mantis.tokeek.de/view.php?id=519
...
- add test case for this
10 years ago
reger
16bc267a32
add test case for snippet html encoding check
10 years ago
reger
77851fa53c
fix parser test cases
...
(Vocabulary paramete)
10 years ago
reger
df83fcc4fc
disable optimistic GC assumption in StandardMemoryStrategy
...
After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC.
Disabeling this check improved eom exceptions.
Added simplest testcase used for verification
10 years ago
Michael Peter Christen
68c605d637
replace with CommonPattern.SPACE for split
10 years ago
reger
9edc7308aa
update to metadata-extractor-2.7.0.jar
...
add 2 simple JUnit test cases for jpeg and tif parsing
10 years ago
reger
5d67e165d9
remove redundant null check in ResponseHeader.lastModified
...
added a JUnit testcase for ResponseHeader dates (using age()),
adjusted age() to pass all tests
10 years ago
reger
ea633a794c
including small junit test case for WordTokenizer
10 years ago
reger
aa2e15d846
allow url parameter in worktable apicall
...
allow url=wwwl?param=a¶m=b (with ?, & encoded)
fix: http://mantis.tokeek.de/view.php?id=100
fix double adding of '&' in MultiProtocolURL.escape()
10 years ago
reger
e88537522d
allow single quote " ' " in query
...
see http://mantis.tokeek.de/view.php?id=379
-add QueryGoal test case for this
10 years ago
reger
e50b2b4d04
fix test case MultiProtocolURL.toString()
...
(only allowed on AnchorURL)
10 years ago
reger
b510b182d8
- update Maven pom
...
- add ppt parser test case
10 years ago
Michael Peter Christen
2de159719b
added an option to set 'obey nofollow' for links with rel="nofollow"
...
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
10 years ago
reger
1f2eba977d
add test case for Records (used in HostBalancer)
...
- simulating seek error (http://mantis.tokeek.de/view.php?id=411 )
11 years ago
reger
e94efd4d7c
update to JUnit 4.11
...
- fix build.xml -> parserTest error on Windows due to javac encoding
11 years ago
reger
3b77e41f1a
adding test for HostQueue crawl stack
...
- simulating problem with zero length stack file (but not fixing it)
- adding test data clean to maven pom
11 years ago
reger
431a5f9c4e
added test case for TextSnippet,
...
removed obsolete/unused parameter and reference to MediaSnippet
11 years ago
reger
7847a93558
fix AbstractParser.singleList not adding null strings
...
- prevents null titles in oo... parser (as detected by ParserTest)
- correct ParserTest dc_description check (dc_description allowed to return 0 length array)
11 years ago
reger
0b6db04e40
fix contentscraper img height/width parsing
...
prevent numberformat exception on common "100px" property
- include in test case
11 years ago
reger
bb8181b2be
fix: resolve url without path but searchpart
...
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47
added test case for getHost
11 years ago
reger
86f6975edc
exclude html tags in in/outboundlinks_anchortext_txt parsed text
...
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
11 years ago
reger
71649bf22d
add test case htmlParser.parse - getCharset
...
(which fails)
11 years ago
reger
6878c90f99
fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378 )
...
requiring following ":" for fc and fd prefix and made pattern match case insesitive
- add some more ipv6 test cases to MultiProtocolURLTest.java
11 years ago
reger
c8d437b69a
clean up test sources
...
rename to current package names and move to default location
11 years ago
reger
18a56446ce
reorg URL test classes add isLocal test with some IPv6 examples
...
- putting in default location and clean old package names
- add some valid RFC IPv6 sample urls (which don't pass the isLocal test)
11 years ago
reger
10a6346056
clean-up test cases
...
to work with current source
11 years ago
reger
b4fdb8c887
cleanup test directory from Jetty 9 implementation samples
...
- current Jetty implementation advances so that it seems not beneficial to keep the code
as it makes the test unuseable and use of Jetty 9 is due to Java 1.7 dependency not in sight.
11 years ago
reger
71d2655c02
downgrade to Jetty 8 to assure support of JRE 1.6
...
- introduce a YaCyHttp interface to modulize/separate http server
- adjust the Jetty version specific implementation part (in package net.yacy.http)
- putting the version specific code in classes starting with Jetty8xxxx
- moved existing Jetty9xxx implementation into a test class (to keep the code)
- adjust build to the changed jars
- make use of the introduced YaCyHttpServer interface in related htroot servlets
- adjust other test cases/classes
11 years ago
reger
f7f86d8a5d
update to Jetty 9 jars
...
- include javax.servlet 3.0
11 years ago
Roland Haeder
841a28ae76
Added 'final' for all exception blocks as this helps the Java compiler
...
to optimize memory usage
Conflicts:
source/net/yacy/search/Switchboard.java
11 years ago
reger
4fec35a665
adjust Test case EmbeddedSolrConnector
12 years ago
reger
160ce568b3
move testing SolrServlet.main to test, making include of jetty*.jar in distribution and classpath obsolete
...
- move jetty*.jar to test library
- move SolrServlet.main as is to test, add also a junit test simulating main
- add build.xml cleanup for EmbeddedSolrConnectorTest created test/DATA
- adjust some test compile errors
12 years ago
orbiter
cd19d0517e
added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7513 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
fd74bc388c
* fix small bug in sessionid-removal
...
* add testcase for seesionid-removal
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7333 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago