Commit Graph

78 Commits (2d67f2924412747d9de8ee634dc98ceeba947c20)

Author SHA1 Message Date
reger 7847a93558 fix AbstractParser.singleList not adding null strings
11 years ago
reger 0b6db04e40 fix contentscraper img height/width parsing
11 years ago
reger bb8181b2be fix: resolve url without path but searchpart
11 years ago
reger 86f6975edc exclude html tags in in/outboundlinks_anchortext_txt parsed text
11 years ago
reger 71649bf22d add test case htmlParser.parse - getCharset
11 years ago
reger 6878c90f99 fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378)
11 years ago
reger c8d437b69a clean up test sources
11 years ago
reger 18a56446ce reorg URL test classes add isLocal test with some IPv6 examples
11 years ago
reger 10a6346056 clean-up test cases
11 years ago
reger b4fdb8c887 cleanup test directory from Jetty 9 implementation samples
11 years ago
reger 71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6
11 years ago
reger f7f86d8a5d update to Jetty 9 jars
11 years ago
reger fe87fb638a adjust test/ParserTest to dc_description data type
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
reger 97ab5b90e8 - odt & ooxml (office document) parser correction to add content to fulltext index
12 years ago
reger 4fec35a665 adjust Test case EmbeddedSolrConnector
12 years ago
reger 160ce568b3 move testing SolrServlet.main to test, making include of jetty*.jar in distribution and classpath obsolete
12 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
13 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter cd19d0517e added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
14 years ago
f1ori 01cb3bbaec * fix patchCharsetEncoding-test (patchCharsetEncoding now returns null on input null)
14 years ago
f1ori fd74bc388c * fix small bug in sessionid-removal
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 844f158686 - removed dependencies in header framework:
14 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter b68deb407a - moved test data from /bin to /test/words
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
f1ori 34c71b22e8 fix and enable parser unit tests (tested with eclipse)
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter ce7924d712 better concurrency for rwi entry parsing during search processing
15 years ago
orbiter 72ac5bd80f refactoring of search process.
15 years ago
f1ori d515bc11e2 added ooxmlparser
15 years ago
f1ori 8c1b02af04 * fix warning in testcase
15 years ago
orbiter 65b1d51e70 added xml version of windows office test files
16 years ago
f1ori 67da20647f * add new odf parser based on sax-xml-parser
16 years ago
f1ori 06557485f5 * added parser unittest!
16 years ago
f1ori 69dfd03985 reactivate unittests
16 years ago
orbiter d553e4ff39 added visio test files and mime types
16 years ago
lotus bb570716e6 added more testfiles
16 years ago
orbiter 84185baa81 added more test files for windows from lulabad
16 years ago
orbiter 3246358485 mistake -> rename
16 years ago
orbiter 55ec57d27f added linux umlute test files from low012
16 years ago
orbiter e9262b3890 re-named old test files
16 years ago
orbiter ff2a54da68 added more umlaute test files: mac
16 years ago
orbiter 204220ecd5 added test files for UTF-8 / Umlaute - Testing:
16 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago