Commit Graph

38 Commits (9d64693cfb698c35704d9f0764f77ce455a3ccb3)

Author SHA1 Message Date
orbiter 2f49666908 integrated the character decoding into the parser, removed old code 17 years ago
orbiter bfcf9b7aa3 - added language detection using metadata from documents: html and odt documents provide this information 17 years ago
orbiter 0cd0fee546 fixed bug with wrong proxy result enqueueing. See: 17 years ago
danielr 8422ee5ec4 - fixed UnsupportedEncoding (in proxy) using defaultCharset if no characterEncoding can be determined 17 years ago
danielr 621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net) 17 years ago
danielr 3bb870bfcd added final where possible 17 years ago
orbiter c3d461d191 - removed superfluous copyright statement 17 years ago
danielr 7feae906aa - organize imports 17 years ago
orbiter 724bbdf9b2 refactoring of RSS reader 17 years ago
orbiter 87a8747ce3 - enhanced recognition, parsing, management and double-occurrence-handling of image tags 17 years ago
orbiter efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser 17 years ago
low012 b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660) 17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation: 18 years ago
orbiter 03847bebc1 removed unused libs 18 years ago
orbiter 9da0e53fe8 repaired rss feed reader 18 years ago
theli 1f61c13697 *) RSS-parser extracts the author tags now 18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results, 18 years ago
orbiter a738b57b31 added author tag to indexing content 18 years ago
theli f17ce28b6d *) plasmaHTCache: 19 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) 19 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets 19 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser) 19 years ago
theli d0a5a53789 *) changes needed for multi-language support 19 years ago
theli b0e8ff6eda *) some TODO makers for UTF-8 problem 19 years ago
theli f3ac4dbbb9 *) better handling of server shutdown 19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL 19 years ago
theli 45b39ee1be *) solving unpacking problems with to long filename by 19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser 19 years ago
orbiter b21b9df2d0 added section headlines generation to html parser 19 years ago
orbiter 3d8a5ae652 code cleanup 20 years ago
theli bdf30117c1 *) Redesign of parser configuration 20 years ago
hydrox 56b9f34411 *)removed unused imports 20 years ago
theli 6dd3ec0dc4 *) Adding debug="true" debuglevel="lines,vars,source" to ant build files 20 years ago
theli 84f9d8f7f0 *) migrating ant build files to generate a single extension tar per default 20 years ago
theli 8bd49ba535 *) setting root dir for all tar files properly 20 years ago
theli 361f05978d Multiple updates regarding the yacy seedUpload facility, 20 years ago
theli 1dad015b0b *) Migration of Ant build files 20 years ago
theli 351c86d5d9 *) Migration of optional Content Parser integration 20 years ago