Commit Graph

283 Commits (e039a797d28d90ef5af3a7e1695fc0bf60867507)

Author SHA1 Message Date
luccioman e41d046a9d Improved parsing support for OOXML spreadsheets (.xlsx)
7 years ago
luccioman 780173008e Implemented partial stream parsing of tar archives.
7 years ago
luccioman acab6a6def Also handle text content when parsing XML within limits.
7 years ago
reger f38fb7f02c Add junit test for AbstractOperations.addOperand()
7 years ago
luccioman ed678186a8 Updated xml parser limited parsing test for use latest jdk.
7 years ago
luccioman f369679d1c Fixed read/copy on input streams reading sometimes less than expected.
7 years ago
luccioman bf55f1d6e5 Started support of partial parsing on large streamed resources.
7 years ago
luccioman 2a87b08cea Removed temporary html parser test code
7 years ago
luccioman 90a7c1affa HTML parser : removed unnecessary remaining recursive processing
7 years ago
luccioman 9b1bb2545e Refactored plain-text URLs detection implementation.
7 years ago
luccioman 8da3174867 Ensure lower case conversion consistency with any default locale.
7 years ago
luccioman 286f3018bd Made mime type and extension normalization locale independent.
7 years ago
luccioman 319231a458 Added a generic XML parser, able to parse elements text and URLs.
7 years ago
luccioman 64cec2790d Improved character encoding detection from Content-Type header
8 years ago
luccioman 1acb7005d0 Added a basic JUnit test with test gz files for the gzip parser
8 years ago
luccioman 1e2fb76720 Properly close test files in htmlParser unit test
8 years ago
luccioman 9dd790087d Added HT Cache basic statistics (hit rate)
8 years ago
luccioman 28b451a0b3 Made Cache compression level and lock timeout user configurable
8 years ago
luccioman a7394b479b Limit the synchronization blocking time on some Cache operations.
8 years ago
Michael Peter Christen 6fe735945d migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8
8 years ago
luccioman a04feac064 Ensure file input streams proper closing in both success and failures
8 years ago
luccioman d98c04853d Ensure proper closing of file input streams.
8 years ago
luccioman c226ded799 Fix unescape of URLs having some '%' chars but not percent-encoded
8 years ago
reger 077d062be3 Adjust mergeDocuments to keep youngest last-modified date of document
8 years ago
luccioman 522a268305 Improved new blacklist entries URL scheme detection.
8 years ago
luccioman 31fff2c986 Extended WikiCode template inclusion syntax support.
8 years ago
reger 7a7da698d4 fix unit test MultiProtocolURL(file) assertion for Windows path with
8 years ago
luccioman 23775e76e2 Fixed endless loop case in wikicode processing.
8 years ago
luccioman 0bc868a819 Improved support for non ASCII chars in local file system URLs
8 years ago
reger 777cb5b812 remove test case for Standard_MemoryControl which will always fail
8 years ago
reger 1ccc44e681 fix default/httpd.mime Z file extension to lower case
8 years ago
reger 18c7563dbe Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages
8 years ago
reger 275c0cddd1 Adjust DefaultServlet test case to recent change,
8 years ago
reger 41e2ee0eca Fix call parameter for ConnectionInfo in MonitorHandler
8 years ago
reger f254fcfc67 fix htmlParser <script> text extraction on code containing expression
8 years ago
luccioman 2f191e0e1c Improved MultiprocotolURL non ASCII characters support.
8 years ago
luccioman 5c8958bcea Updated Javadoc and Junit tests for the WebStructureGraph class.
8 years ago
luccioman d9766ca981 Fixed WatchWebStructure_p.html render to include https URLs.
8 years ago
luccioman ed3dd5e31a Fixed webstructure.xml API used with a domain name 'about' parameter.
8 years ago
luccioman 0da1e6ba16 Factored code re-implementing DigestURL.hosthash() method.
8 years ago
luccioman 86adfef30f Added automated unit tests and perfs test for WebStructureGraph class.
8 years ago
luccioman c9889991b9 Fixed 2 failing JUNit tests.
8 years ago
reger 083df255e4 fix html tag attribute parsing containing attribute w/o value
8 years ago
reger cb95b7339a include html5 <time> tag in content scraper,
8 years ago
luccioman aa9ddf3c23 Added control over Robots.txt active threads maximum number.
8 years ago
reger fdcf33f08f fix Domain.stripToHostName for some IPv6 cases
8 years ago
reger ac6e198bd1 add unit test for Domains.stripToPort,
8 years ago
luccioman a0dfbaca6a FileUtils : added some JavaDocs and unit test cases
8 years ago
reger 395f2e8946 Make ServletRequest implement the standardized HttpServletRequest interface,
8 years ago
luccioman 7296e3884f Switched even more URLs to pure relative ones.
8 years ago
luccioman 731684105a Improved absolute URLs rendering in OpenSearch desc and RSS feeds.
8 years ago
reger c9e81d2fa0 fix Column parsing from celldefinition string, without cellwidth def.
8 years ago
reger af39a76bf6 Reduce number of default max. search navigator lines (from 10000)
8 years ago
reger 20a1b29ed3 add simple test case for ReferenceContainer helpful for debugging
8 years ago
reger 3c7220bc7b Refacture rwi reference word position and word distance calculation
8 years ago
luccioman c3c4a52408 Added more examples in Blacklist JUnit test.
8 years ago
reger 8b74a6bf57 fix min/max calculation of WordReferenceVars.distance()
8 years ago
luccioman 93ea366778 Updated license header file name
8 years ago
luccioman 4c0be4d5d4 Fixed maven compilation error
8 years ago
luccioman 7717a3d43d Fixed license headers on files created to improve favicon management.
8 years ago
luccioman 6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
8 years ago
luccioman 7136b1ad60 HTML validation : fixed URL encoding of Pictures link.
8 years ago
luccioman 3ccd89e274 Fixed MultiProtocolURL.resolveBackpath to handle remaining '..' segments
8 years ago
luccioman f1f4459f88 Added some unit tests for Blacklist.isListed()
8 years ago
reger e68b00678e prevent negative score on URIMetadataNode - in the special case were no
8 years ago
reger b752bcfecb adjust date in text detection to ignore some program version strings
8 years ago
reger b017e97421 optimize condenser language detection a little.
8 years ago
reger ae3717d087 adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! )
8 years ago
reger 474f0476c6 adjust Tokenizer sentence count on trailing text after last recognized sentence
8 years ago
reger 1a79c64495 generalize DateDetection with holiday date rules readily available in icu
8 years ago
reger 32a2e3a22a have RSSFeed.getChannel return empty message on missing channel element,
8 years ago
luccioman 4585a60d7e Made use of the constant corresponding to the hard-coded value.
8 years ago
luccioman 1bb0b135ac Avoid duplication of various MS Windows file URLs flavors
8 years ago
reger 6f8c3ccea4 improve url hash computation for file path with mixed java & windows
8 years ago
reger 330768c8a2 fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
8 years ago
reger 11786457b7 add test case for EmeddedSolrConnector close()
8 years ago
reger 585d2a6441 test case: for NewsPool to check the id modificator (for unique id)
8 years ago
reger ff6589fc0f test case: simulating multi word query for local rwi index
8 years ago
reger 7f63fc50f3 prepare a IndexSegment test case for RWI index testing
8 years ago
reger 272cdd496a reactivate sentence counter in WordTokenizer for phrasepos ranking,
8 years ago
Michael Peter Christen 5e165a8150 removed unused imports
8 years ago
reger e310ec5f70 fix posInText ranking calculation to score 0 on no position info
8 years ago
reger 39dd244693 fix ConcurrentScoreMap.set() calculation of totalCount()
8 years ago
reger ebde21079a refactor xlsParser to include Excel file attribute (like author) in parser result doc.
8 years ago
reger 5e335b32da fix Blacklist.contains() matching path pattern to string
8 years ago
reger f89d4eb51d fix MultiProtocolURL init (assign of host) for urls with '/' in query part
8 years ago
reger 87fcfc6d78 Adjusted hash computation and toNormalform for file:// protocol to deliver
8 years ago
reger 7b226afc33 fix HostQueueTest - changed open parameter
8 years ago
luccioman 893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
8 years ago
reger fcc29c36f0 test case for HostBalancer issue in intranet mode
8 years ago
luccioman 6e96c7341a Merge remote-tracking branch 'origin/master'
8 years ago
reger a476d06aec wiki header code test string add "closing" tag
9 years ago
reger d4da4805a8 internal wiki code, require header line to start with markup
9 years ago
reger 223071337b Translator to take caution of word boundaries to identify text portion to
9 years ago
reger a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
9 years ago
reger b74cddc49c upd to Jetty v9.2.16.v20160414
9 years ago
reger 24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
9 years ago
reger 902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
9 years ago
reger ec24a0c85a add test case for optimized toTokens()
9 years ago
luc 26f1ead57c Created ViewFavicon class specialized in favicon viewing.
9 years ago