Commit Graph

75 Commits (dfd5e823c3fa8ea7d693ae881a9efe18409002a5)

Author SHA1 Message Date
karlchenofhell 0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
18 years ago
rramthun e12e934ade *) Fixed broken compile process.
18 years ago
theli 24ea4ca631 *) adding first version of postscript parser
18 years ago
theli 75d90834a2 *) adding additional file extension for powerpoint
18 years ago
theli 1f61c13697 *) RSS-parser extracts the author tags now
18 years ago
theli b374812f01 *) adding rpm packager as author
18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
orbiter 052f28312a removed assortments from indexing data structures
18 years ago
orbiter 8fdefd5c68 generalization of payload definition of index storage
18 years ago
low012 4feaa91890 *) Added additional MIME-Type.
18 years ago
low012 89af433879 *) Deleted parts of WebCat that were not needed for parsing SWFs.
18 years ago
low012 8c9bc7e341 *) extracting urls works now
18 years ago
low012 493391e42d *) new flash parser, still experimental
18 years ago
octoate e4a3574b77 StringBuffer now resets every time the parser is called
18 years ago
octoate cc24dde5e0 First version of a MS Excel parser based on Apache POI
18 years ago
octoate 1c4076da8a First version of the MS Powerpoint parser based on Apache POI
18 years ago
theli 5b75d64d7d *) bugfix for last commit
18 years ago
theli 71ed104bc7 *) adding additional rpm mimetype (used by packman)
18 years ago
theli 1586d57187 *) odtParser: better handling of large files
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
theli b73efd5565 *) missing changes needed because of last commit
18 years ago
theli 813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
theli b0e8ff6eda *) some TODO makers for UTF-8 problem
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
theli 9d13aeca13 *) removing class. does not work so far
19 years ago
theli 95a84ae469 *) adding missing classes
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
theli 45b39ee1be *) solving unpacking problems with to long filename by
19 years ago
orbiter 015d044c25 tried to fix some problems with latest changes to httpc
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter b21b9df2d0 added section headlines generation to html parser
19 years ago
orbiter 9544c47684 added some UTF-8 handling.
19 years ago
orbiter 9086261476 refactoring of base64 encoding:
19 years ago
theli 44fa94ac52 *) Modifications for dbImport functionality
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
orbiter a04930f025 code cleanup
19 years ago
theli 8ed0aaae8d *) Adding content Parser for RPM Files
19 years ago
theli 818d37ce44 *) Removing getSimpleName
19 years ago
theli bdf30117c1 *) Redesign of parser configuration
19 years ago
theli 90d6c6223b *) Adding color codes to network graphic legend
19 years ago
theli c2fe3a1670 *) Updating jMimeMagic Ruleset
19 years ago
theli ca26aab9b1 *) More debugging output for migrateWords
19 years ago