Commit Graph

269 Commits (495ca57f61d7980200d2ac13bbd8286eee8c2178)

Author SHA1 Message Date
luccioman e357ade47d Reduced memory footprint of text snippet extraction
7 years ago
luccioman e115e57cc7 Reduced text snippet extraction processing time.
7 years ago
luccioman 3b89c232db Easier tracking of longest text snippets initializations
7 years ago
luccioman 8d7099a081 Handle escaped line breaks and separators in vocabulary import from CSV
7 years ago
luccioman eb20589e29 Fixed issue #158 : completed div CSS class ignore in crawl
7 years ago
luccioman 33593c22e9 Fixed loss of other modifiers on keywords/tags search navigation links
7 years ago
luccioman 9412881230 Added basic support for autotagging microdata annotated item types.
7 years ago
luccioman 5a14d34a7d Refactoring : documented and extracted autotagging processing functions.
7 years ago
luccioman 58b9834729 Added HTML microdata typed items parsing capability.
7 years ago
luccioman fa6d030b0b Moved dbtest to the test source folder.
7 years ago
luccioman 098ee63911 Added a manual performance test for the HostBalancer.
7 years ago
luccioman 46b5249c20 Removed time condition on HostBalancer initialization in JUnit test.
7 years ago
luccioman 36e9b1c5b3 Fixed SegmentTest test case time dependant occasional failures
7 years ago
Michael Peter Christen b907819cb4 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
7 years ago
Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names
7 years ago
luccioman d95b288f19 Removed use of deprecated Jetty IPAccessHandler for client filtering.
7 years ago
luccioman 0a120787e3 Improved accuracy of URLs search filters : protocol, tld, host, file ext
7 years ago
luccioman d1c7dfd852 Fixed URL parsing with fragment and empty path
7 years ago
luccioman e2f6427a63 Added a basic JUnit test for the Visio parser (vsdParser)
7 years ago
luccioman d41ad7af6f Restore initial locale at the end of a JUnit test case which modify it.
7 years ago
luccioman 7206f1ed71 Do locale neutral case conversions on domain names.
7 years ago
luccioman 398c66f06c Do locale neutral case conversions in MultiProtocolURL
7 years ago
luccioman 9531b83598 Do locale neutral case conversions in Classification
7 years ago
luccioman ac209cac2e Updated the generic top-level known domains list.
7 years ago
luccioman fcd57e2d0f Improved some JUnit tests isolation and resources release
7 years ago
luccioman e0eda84c24 Remove old hard-coded holiday dates from DateDection class.
7 years ago
luccioman 73977ec0fe Added a html parser charset detection unit test
7 years ago
luccioman 285f0d6a39 Consistently encode snapshot image with format requested on the API.
7 years ago
luccioman 7c319c841e Fixed pdf2image conversion with imagemagick on PDFs having transparency
7 years ago
luccioman fe75f326d8 Fixed ProfilingGraph calculation integer overflows and added test class.
7 years ago
luccioman 5bf76f058a Adjusted ResponseHeaderTest to succeed on slow or highly loaded CPU
7 years ago
luccioman 32c9dfa768 Added partial bzip2 stream parsing support and bzipParser Junit test
7 years ago
luccioman dd9cb06d25 Fixed RWI distance calculation on multi words search queries.
7 years ago
luccioman c6ae87168a Added unit tests on the gzip parser.
7 years ago
luccioman 169ffdd1c7 Finer control on max links to parse in the html parser.
7 years ago
luccioman 4743a104b5 Added some unit tests on FileUtils.
7 years ago
luccioman e41d046a9d Improved parsing support for OOXML spreadsheets (.xlsx)
7 years ago
luccioman 780173008e Implemented partial stream parsing of tar archives.
7 years ago
luccioman acab6a6def Also handle text content when parsing XML within limits.
7 years ago
reger f38fb7f02c Add junit test for AbstractOperations.addOperand()
7 years ago
luccioman ed678186a8 Updated xml parser limited parsing test for use latest jdk.
7 years ago
luccioman f369679d1c Fixed read/copy on input streams reading sometimes less than expected.
7 years ago
luccioman bf55f1d6e5 Started support of partial parsing on large streamed resources.
7 years ago
luccioman 2a87b08cea Removed temporary html parser test code
7 years ago
luccioman 90a7c1affa HTML parser : removed unnecessary remaining recursive processing
7 years ago
luccioman 9b1bb2545e Refactored plain-text URLs detection implementation.
7 years ago
luccioman 8da3174867 Ensure lower case conversion consistency with any default locale.
7 years ago
luccioman 286f3018bd Made mime type and extension normalization locale independent.
7 years ago
luccioman 319231a458 Added a generic XML parser, able to parse elements text and URLs.
7 years ago
luccioman 64cec2790d Improved character encoding detection from Content-Type header
8 years ago