You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yacy_search_server/source/net/yacy/document/parser
Michael Peter Christen 25573bd5ab
added a crawl filter based on <div> tag class names
7 years ago
..
html added a crawl filter based on <div> tag class names 7 years ago
images Ensure file input streams proper closing in both success and failures 8 years ago
rdfa Ensure lower case conversion consistency with any default locale. 8 years ago
xml Improved parsing support for OOXML spreadsheets (.xlsx) 8 years ago
GenericXMLParser.java Also handle text content when parsing XML within limits. 8 years ago
apkParser.java Properly close file output streams even on exceptions scenarios. 8 years ago
audioTagParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
bzipParser.java added a crawl filter based on <div> tag class names 7 years ago
csvParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
docParser.java Cleaned up some Javadoc warnings. 8 years ago
dwgParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
genericParser.java Added parsing within bounds implementation to the generic parser. 8 years ago
gzipParser.java added a crawl filter based on <div> tag class names 7 years ago
htmlParser.java added a crawl filter based on <div> tag class names 7 years ago
linkScraperParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
mmParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
odtParser.java fix delete of temp file after odt % ooxml parser 9 years ago
ooxmlParser.java Improved parsing support for OOXML spreadsheets (.xlsx) 8 years ago
pdfParser.java Ensure proper closing of file input streams. 8 years ago
pptParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
psParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
rdfParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
rssParser.java Added RSS parser support for maximum content bytes parsing limit 8 years ago
rtfParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
sevenzipParser.java added a crawl filter based on <div> tag class names 7 years ago
sidAudioParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
sitemapParser.java reduce creation of empty legacy RequestHeader() in situation where null 8 years ago
tarParser.java added a crawl filter based on <div> tag class names 7 years ago
torrentParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
vcfParser.java result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. 9 years ago
vsdParser.java Added a basic JUnit test for the Visio parser (vsdParser) 7 years ago
xlsParser.java refactor xlsParser to include Excel file attribute (like author) in parser result doc. 9 years ago
zipParser.java added a crawl filter based on <div> tag class names 7 years ago