You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
yacy_search_server/source/net/yacy/document/parser
orbiter 610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
14 years ago
..
html bugfixes in html parser 14 years ago
images hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 14 years ago
xml more UTF8 getBytes() performance hacks 14 years ago
bzipParser.java *) minor changes 14 years ago
csvParser.java - enhanced html parser: recognized much more details in the content 14 years ago
docParser.java - enhanced html parser: recognized much more details in the content 14 years ago
genericParser.java - enhanced html parser: recognized much more details in the content 14 years ago
gzipParser.java fixed bugs in parser and ftp client 14 years ago
htmlParser.java - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index. 14 years ago
mmParser.java - enhanced html parser: recognized much more details in the content 14 years ago
odtParser.java - enhanced html parser: recognized much more details in the content 14 years ago
ooxmlParser.java - enhanced html parser: recognized much more details in the content 14 years ago
pdfParser.java added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html 14 years ago
pptParser.java - enhanced html parser: recognized much more details in the content 14 years ago
psParser.java - enhanced html parser: recognized much more details in the content 14 years ago
rssParser.java - enhanced html parser: recognized much more details in the content 14 years ago
rtfParser.java - enhanced html parser: recognized much more details in the content 14 years ago
sevenzipParser.java - enhanced html parser: recognized much more details in the content 14 years ago
sidAudioParser.java - enhanced html parser: recognized much more details in the content 14 years ago
sitemapParser.java better abstraction of http client identification 14 years ago
swfParser.java - enhanced html parser: recognized much more details in the content 14 years ago
tarParser.java *) minor changes 14 years ago
torrentParser.java - enhanced html parser: recognized much more details in the content 14 years ago
vcfParser.java - enhanced html parser: recognized much more details in the content 14 years ago
vsdParser.java - enhanced html parser: recognized much more details in the content 14 years ago
xlsParser.java - enhanced html parser: recognized much more details in the content 14 years ago
zipParser.java - applied many small performance hacks 14 years ago