Commit Graph

38 Commits (d6a5c98080240771f3877b49c400f60004e646db)

Author SHA1 Message Date
orbiter 2f49666908 integrated the character decoding into the parser, removed old code
16 years ago
orbiter bfcf9b7aa3 - added language detection using metadata from documents: html and odt documents provide this information
16 years ago
orbiter 0cd0fee546 fixed bug with wrong proxy result enqueueing. See:
16 years ago
danielr 8422ee5ec4 - fixed UnsupportedEncoding (in proxy) using defaultCharset if no characterEncoding can be determined
17 years ago
danielr 621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
17 years ago
danielr 3bb870bfcd added final where possible
17 years ago
orbiter c3d461d191 - removed superfluous copyright statement
17 years ago
danielr 7feae906aa - organize imports
17 years ago
orbiter 724bbdf9b2 refactoring of RSS reader
17 years ago
orbiter 87a8747ce3 - enhanced recognition, parsing, management and double-occurrence-handling of image tags
17 years ago
orbiter efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
17 years ago
low012 b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660)
17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago
orbiter 03847bebc1 removed unused libs
18 years ago
orbiter 9da0e53fe8 repaired rss feed reader
18 years ago
theli 1f61c13697 *) RSS-parser extracts the author tags now
18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
theli b0e8ff6eda *) some TODO makers for UTF-8 problem
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
theli 45b39ee1be *) solving unpacking problems with to long filename by
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter b21b9df2d0 added section headlines generation to html parser
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
theli bdf30117c1 *) Redesign of parser configuration
19 years ago
hydrox 56b9f34411 *)removed unused imports
19 years ago
theli 6dd3ec0dc4 *) Adding debug="true" debuglevel="lines,vars,source" to ant build files
20 years ago
theli 84f9d8f7f0 *) migrating ant build files to generate a single extension tar per default
20 years ago
theli 8bd49ba535 *) setting root dir for all tar files properly
20 years ago
theli 361f05978d Multiple updates regarding the yacy seedUpload facility,
20 years ago
theli 1dad015b0b *) Migration of Ant build files
20 years ago
theli 351c86d5d9 *) Migration of optional Content Parser integration
20 years ago