Commit Graph

10 Commits (24b44b8568f9ba6653e5774d9c13488403a1a8cc)

Author SHA1 Message Date
Michael Christen 3a46b07603 fixed many links to old forum, now https://searchlab.eu
6 years ago
luccioman 3fb449b3b6 Properly resolve relative URLs against document URL in html base tags
6 years ago
luccioman 2c155ece77 Fixed JUnit test after removal of unused Transformer
7 years ago
luccioman 58b9834729 Added HTML microdata typed items parsing capability.
7 years ago
Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names
7 years ago
luccioman bf55f1d6e5 Started support of partial parsing on large streamed resources.
8 years ago
luccioman 9b1bb2545e Refactored plain-text URLs detection implementation.
8 years ago
reger cb95b7339a include html5 <time> tag in content scraper,
8 years ago
luccioman 7717a3d43d Fixed license headers on files created to improve favicon management.
8 years ago
luc 3cc5619d93 Improved HTML icons indexing and rendering in search results.
9 years ago