Commit Graph

290 Commits (fe4c0aa890599cbd60250d9476f527c24da38bb6)

Author SHA1 Message Date
reger cb95b7339a include html5 <time> tag in content scraper,
8 years ago
luccioman aa9ddf3c23 Added control over Robots.txt active threads maximum number.
8 years ago
reger fdcf33f08f fix Domain.stripToHostName for some IPv6 cases
8 years ago
reger ac6e198bd1 add unit test for Domains.stripToPort,
8 years ago
luccioman a0dfbaca6a FileUtils : added some JavaDocs and unit test cases
8 years ago
reger 395f2e8946 Make ServletRequest implement the standardized HttpServletRequest interface,
8 years ago
luccioman 7296e3884f Switched even more URLs to pure relative ones.
8 years ago
luccioman 731684105a Improved absolute URLs rendering in OpenSearch desc and RSS feeds.
8 years ago
reger c9e81d2fa0 fix Column parsing from celldefinition string, without cellwidth def.
8 years ago
reger af39a76bf6 Reduce number of default max. search navigator lines (from 10000)
8 years ago
reger 20a1b29ed3 add simple test case for ReferenceContainer helpful for debugging
8 years ago
reger 3c7220bc7b Refacture rwi reference word position and word distance calculation
8 years ago
luccioman c3c4a52408 Added more examples in Blacklist JUnit test.
8 years ago
reger 8b74a6bf57 fix min/max calculation of WordReferenceVars.distance()
8 years ago
luccioman 93ea366778 Updated license header file name
8 years ago
luccioman 4c0be4d5d4 Fixed maven compilation error
8 years ago
luccioman 7717a3d43d Fixed license headers on files created to improve favicon management.
8 years ago
luccioman 6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
8 years ago
luccioman 7136b1ad60 HTML validation : fixed URL encoding of Pictures link.
8 years ago
luccioman 3ccd89e274 Fixed MultiProtocolURL.resolveBackpath to handle remaining '..' segments
8 years ago
luccioman f1f4459f88 Added some unit tests for Blacklist.isListed()
8 years ago
reger e68b00678e prevent negative score on URIMetadataNode - in the special case were no
8 years ago
reger b752bcfecb adjust date in text detection to ignore some program version strings
8 years ago
reger b017e97421 optimize condenser language detection a little.
8 years ago
reger ae3717d087 adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! )
8 years ago
reger 474f0476c6 adjust Tokenizer sentence count on trailing text after last recognized sentence
8 years ago
reger 1a79c64495 generalize DateDetection with holiday date rules readily available in icu
8 years ago
reger 32a2e3a22a have RSSFeed.getChannel return empty message on missing channel element,
8 years ago
luccioman 4585a60d7e Made use of the constant corresponding to the hard-coded value.
8 years ago
luccioman 1bb0b135ac Avoid duplication of various MS Windows file URLs flavors
8 years ago
reger 6f8c3ccea4 improve url hash computation for file path with mixed java & windows
8 years ago
reger 330768c8a2 fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
8 years ago
reger 11786457b7 add test case for EmeddedSolrConnector close()
8 years ago
reger 585d2a6441 test case: for NewsPool to check the id modificator (for unique id)
8 years ago
reger ff6589fc0f test case: simulating multi word query for local rwi index
8 years ago
reger 7f63fc50f3 prepare a IndexSegment test case for RWI index testing
8 years ago
reger 272cdd496a reactivate sentence counter in WordTokenizer for phrasepos ranking,
8 years ago
Michael Peter Christen 5e165a8150 removed unused imports
8 years ago
reger e310ec5f70 fix posInText ranking calculation to score 0 on no position info
8 years ago
reger 39dd244693 fix ConcurrentScoreMap.set() calculation of totalCount()
8 years ago
reger ebde21079a refactor xlsParser to include Excel file attribute (like author) in parser result doc.
8 years ago
reger 5e335b32da fix Blacklist.contains() matching path pattern to string
8 years ago
reger f89d4eb51d fix MultiProtocolURL init (assign of host) for urls with '/' in query part
8 years ago
reger 87fcfc6d78 Adjusted hash computation and toNormalform for file:// protocol to deliver
8 years ago
reger 7b226afc33 fix HostQueueTest - changed open parameter
8 years ago
luccioman 893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
8 years ago
reger fcc29c36f0 test case for HostBalancer issue in intranet mode
8 years ago
luccioman 6e96c7341a Merge remote-tracking branch 'origin/master'
8 years ago
reger a476d06aec wiki header code test string add "closing" tag
9 years ago
reger d4da4805a8 internal wiki code, require header line to start with markup
9 years ago
reger 223071337b Translator to take caution of word boundaries to identify text portion to
9 years ago
reger a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
9 years ago
reger b74cddc49c upd to Jetty v9.2.16.v20160414
9 years ago
reger 24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
9 years ago
reger 902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
9 years ago
reger ec24a0c85a add test case for optimized toTokens()
9 years ago
luc 26f1ead57c Created ViewFavicon class specialized in favicon viewing.
9 years ago
luc 07222b3e1a Added favicon url transmission in RWI chunks.
9 years ago
luc 53781299d8 Extracted intranet and filtype related rules from getFaviconURL func
9 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago
luc ef83e34b8a Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger 84c970eaec move test classes to test/java (subdirectory as in Maven standard subdir layout)
9 years ago
luc cfdbc2b487 Improved URLLicence reliability for use by conccurrent non authaurized
9 years ago
luc 571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
9 years ago
reger 1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
9 years ago
reger 4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
9 years ago
reger 288acceac3 fix test htmlParserTest, charset parameter
9 years ago
luc f01d49c37a Process large or local file images dealing directly with content
9 years ago
luc 0de6988604 Added links to more image test suites.
9 years ago
luc 745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc 2895ab552a Made ViewImagePerfTest extend ViewImageTest to ease automated image
9 years ago
luc 4a03cf06e1 Corrected encoding extension arg parsing
9 years ago
reger d223cf0ae4 adjust MediaWiki importer geo coordinate calculation
9 years ago
luc 8da20718aa Created a class to test ViewImage rendering against multiple image
9 years ago
luc ec04d27473 Corrected APNG test suite link name.
9 years ago
luc cbb84ba073 Detailed javadoc.
9 years ago
luc 70111876d2 Filled ViewImageTest.html with all remaining IANA image file formats.
9 years ago
luc e093fb228d Created a generic ViewImage performance render test.
9 years ago
luc 3ad564e2e4 Created a ViewImage rendering performance measurement test.
9 years ago
luc b3f044072e Updated table headers and SVG file url for case sensitive OS.
9 years ago
luc f5746b5490 Added ico and bmp sample pictures
9 years ago
luc baede48161 Added JPEG 2000 and FITS samples
9 years ago
luc 7c9d80c5d0 Added image formats and informations for each format.
9 years ago
luc 0ae9297ca5 Created a html test page to check ViewImage rendering with different
9 years ago
reger bad34804fe optimize parseInt for <img> tag attribute parsing
9 years ago
reger d2cc11ea8f fix html parser taking <style> content as text.
9 years ago
reger e594130aec add test case for partial update - to discover effect on YaCy for update of documents with multivalued date fields (like dates_in_content_dts)
9 years ago
reger d5da9e5a38 fix test methode (add throw for URIMetadataNode)
9 years ago
reger 4cf875336c complete TODO: getFileExtension handle dot in query part
9 years ago
reger c37dda8849 fix NPE on MultiProtocolURL on url with parameter value and '='
10 years ago
reger 71bf95af8a upd parser calls in test cases
10 years ago
reger f63fff9008 fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
10 years ago
reger 2ef8ffdb60 apply UTF-8 encoding
10 years ago
reger 7120ea42f1 fix for path with char code > 255
10 years ago
reger 1d81bd0687 fix url encoding for path see http://mantis.tokeek.de/view.php?id=559
10 years ago
reger f94e34058c fix url (path) %-decoding http://mantis.tokeek.de/view.php?id=519
10 years ago
reger 16bc267a32 add test case for snippet html encoding check
10 years ago
reger 77851fa53c fix parser test cases
10 years ago
reger df83fcc4fc disable optimistic GC assumption in StandardMemoryStrategy
10 years ago
Michael Peter Christen 68c605d637 replace with CommonPattern.SPACE for split
10 years ago
reger 9edc7308aa update to metadata-extractor-2.7.0.jar
10 years ago
reger 5d67e165d9 remove redundant null check in ResponseHeader.lastModified
10 years ago
reger ea633a794c including small junit test case for WordTokenizer
10 years ago
reger aa2e15d846 allow url parameter in worktable apicall
10 years ago
reger e88537522d allow single quote " ' " in query
10 years ago
reger e50b2b4d04 fix test case MultiProtocolURL.toString()
10 years ago
reger b510b182d8 - update Maven pom
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
reger 1f2eba977d add test case for Records (used in HostBalancer)
11 years ago
reger e94efd4d7c update to JUnit 4.11
11 years ago
reger 3b77e41f1a adding test for HostQueue crawl stack
11 years ago
reger 431a5f9c4e added test case for TextSnippet,
11 years ago
reger 7847a93558 fix AbstractParser.singleList not adding null strings
11 years ago
reger 0b6db04e40 fix contentscraper img height/width parsing
11 years ago
reger bb8181b2be fix: resolve url without path but searchpart
11 years ago
reger 86f6975edc exclude html tags in in/outboundlinks_anchortext_txt parsed text
11 years ago
reger 71649bf22d add test case htmlParser.parse - getCharset
11 years ago
reger 6878c90f99 fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378)
11 years ago
reger c8d437b69a clean up test sources
11 years ago
reger 18a56446ce reorg URL test classes add isLocal test with some IPv6 examples
11 years ago
reger 10a6346056 clean-up test cases
11 years ago
reger b4fdb8c887 cleanup test directory from Jetty 9 implementation samples
11 years ago
reger 71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6
11 years ago
reger f7f86d8a5d update to Jetty 9 jars
11 years ago
reger fe87fb638a adjust test/ParserTest to dc_description data type
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
reger 97ab5b90e8 - odt & ooxml (office document) parser correction to add content to fulltext index
12 years ago
reger 4fec35a665 adjust Test case EmbeddedSolrConnector
12 years ago
reger 160ce568b3 move testing SolrServlet.main to test, making include of jetty*.jar in distribution and classpath obsolete
12 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
13 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter cd19d0517e added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
14 years ago
f1ori 01cb3bbaec * fix patchCharsetEncoding-test (patchCharsetEncoding now returns null on input null)
14 years ago
f1ori fd74bc388c * fix small bug in sessionid-removal
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 844f158686 - removed dependencies in header framework:
14 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter b68deb407a - moved test data from /bin to /test/words
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
f1ori 34c71b22e8 fix and enable parser unit tests (tested with eclipse)
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter ce7924d712 better concurrency for rwi entry parsing during search processing
15 years ago
orbiter 72ac5bd80f refactoring of search process.
15 years ago
f1ori d515bc11e2 added ooxmlparser
15 years ago
f1ori 8c1b02af04 * fix warning in testcase
15 years ago
orbiter 65b1d51e70 added xml version of windows office test files
16 years ago