Commit Graph

103 Commits (f597185026b2350954992f6523f1a21685d36a27)

Author SHA1 Message Date
danielr 74b1a60043 fixed "java.lang.NoClassDefFoundError: org/a"
17 years ago
danielr ae03a54d23 pdfParser: updated lib, fixed ClassNotFoundException: CMSError
17 years ago
orbiter 1689030ee8 refactoring: moved all crawler classes into their own package
17 years ago
orbiter 724bbdf9b2 refactoring of RSS reader
17 years ago
danielr 5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
17 years ago
orbiter fa1090113d - next try to fix the networking problem:
17 years ago
orbiter 87a8747ce3 - enhanced recognition, parsing, management and double-occurrence-handling of image tags
17 years ago
orbiter 0f5c4abaca more generics
17 years ago
orbiter efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
17 years ago
orbiter f4e9ff6ce9 more generics
17 years ago
borg-0300 3cab85158c update for last commit
17 years ago
orbiter ecd7f8ba4e - added NEAR operator (must be written in UPPERCASE in search query)
17 years ago
low012 b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660)
17 years ago
orbiter e22014dc83 some memory enhancements when generating and displaying ymage objects
17 years ago
fuchsi 69521d92e5 Add another external dependency from PDFBox package ("Bouncy Castle"). This is necessary for parsing of some encrypted PDF files.
17 years ago
fuchsi ca83f5a8d9 Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3).
17 years ago
fuchsi e77aec8c9d fix handling of encrypted PDF-Documents (with default user password "")
17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago
orbiter 367fc28928 corrected Brausse->Brausze
18 years ago
orbiter e76fe1c078 - replaced unicode characters in copyright holder name ('Brausse')
18 years ago
orbiter 40b0547611 - documentaton changes (removed old forum links)
18 years ago
orbiter dcb8687904 fix to update cycle
18 years ago
orbiter f323e1813d added commons.logging again (is used by mimeTypeParser)
18 years ago
orbiter 6071668c3b better error message in case that a mime type cannot be found.
18 years ago
orbiter 03847bebc1 removed unused libs
18 years ago
orbiter 9da0e53fe8 repaired rss feed reader
18 years ago
orbiter 36a37f758b fix for oom exception during release download
18 years ago
orbiter 1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
18 years ago
karlchenofhell 0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
18 years ago
rramthun e12e934ade *) Fixed broken compile process.
18 years ago
theli 24ea4ca631 *) adding first version of postscript parser
18 years ago
theli 75d90834a2 *) adding additional file extension for powerpoint
18 years ago
theli 1f61c13697 *) RSS-parser extracts the author tags now
18 years ago
theli b374812f01 *) adding rpm packager as author
18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
orbiter 052f28312a removed assortments from indexing data structures
18 years ago
orbiter 8fdefd5c68 generalization of payload definition of index storage
18 years ago
low012 4feaa91890 *) Added additional MIME-Type.
18 years ago
low012 89af433879 *) Deleted parts of WebCat that were not needed for parsing SWFs.
18 years ago
low012 8c9bc7e341 *) extracting urls works now
18 years ago
low012 493391e42d *) new flash parser, still experimental
18 years ago
octoate e4a3574b77 StringBuffer now resets every time the parser is called
18 years ago
octoate cc24dde5e0 First version of a MS Excel parser based on Apache POI
18 years ago
octoate 1c4076da8a First version of the MS Powerpoint parser based on Apache POI
18 years ago
theli 5b75d64d7d *) bugfix for last commit
18 years ago
theli 71ed104bc7 *) adding additional rpm mimetype (used by packman)
18 years ago
theli 1586d57187 *) odtParser: better handling of large files
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago