Commit Graph

37 Commits (daa04f5db942d293c3d269405421617137ef0d4c)

Author SHA1 Message Date
orbiter efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
17 years ago
low012 b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660)
17 years ago
fuchsi 69521d92e5 Add another external dependency from PDFBox package ("Bouncy Castle"). This is necessary for parsing of some encrypted PDF files.
17 years ago
fuchsi ca83f5a8d9 Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3).
17 years ago
fuchsi e77aec8c9d fix handling of encrypted PDF-Documents (with default user password "")
17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
theli 45b39ee1be *) solving unpacking problems with to long filename by
19 years ago
orbiter 9544c47684 added some UTF-8 handling.
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
theli bdf30117c1 *) Redesign of parser configuration
19 years ago
theli b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
19 years ago
theli 9a98988c3c *) Bugfix for SSL/NIO Bug
20 years ago
theli 890e3f4d4a *) adding missing calls for function close() to avoid "too many open file" bug*) adding
20 years ago
theli 6dd3ec0dc4 *) Adding debug="true" debuglevel="lines,vars,source" to ant build files
20 years ago
theli ef6851798b *) changing thread priority while parsing a pdf file to avoid 100% CPU usage.
20 years ago
theli 84f9d8f7f0 *) migrating ant build files to generate a single extension tar per default
20 years ago
theli 8bd49ba535 *) setting root dir for all tar files properly
20 years ago
theli 7994c485f1 *) Trying to set the document title properly
20 years ago
theli 9ee3e69021 *) Solving "Warning: You did not close the PDF Document" problem when an OutOfMemory Exception occured ...
20 years ago
theli 361f05978d Multiple updates regarding the yacy seedUpload facility,
20 years ago
theli 1dad015b0b *) Migration of Ant build files
20 years ago
theli 2aa5fe8f50 *) Import statements reorganized
20 years ago
theli 351c86d5d9 *) Migration of optional Content Parser integration
20 years ago
theli f44b219e44 *) Eclipse has accidentally copied in the wrong file header into the new files (because these headers were accidentally set as default for the whole workspace instead of the project)
20 years ago
theli 081ebd5517 *) I've accidentally used Java 5.0 syntax for enumerations
20 years ago
theli 58b1a0ba40 *) adding an new package for extra content parsers
20 years ago