Commit Graph

88 Commits (54ddb3262ca5f83ced4510c80c33563c16a94bcc)

Author SHA1 Message Date
karlchenofhell 0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
18 years ago
orbiter 871ee1ce0f one step closer to automatic updates:
18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
karlchenofhell 6fbe31425a - some code-cleanup (no more syntax-warnings here)
18 years ago
orbiter f25c0e98d1 - replaced String by StringBuffer in condenser
18 years ago
allo 782db9099d version independent name for commons-pool lib
18 years ago
orbiter e4570bffaf -implemented a specialized snippet-fetch for media content
18 years ago
orbiter 937ccd4e76 fix for snippet-generation
18 years ago
orbiter ad1e4aa88e added selection of audio, video, image and application resources
18 years ago
orbiter ceb9e3aa17 - enhanced parser: collection of audio, video, image and application links
18 years ago
theli 92f774edd1 *) Better charset encoding detection
18 years ago
theli decb09df6d *) Trying to be more tolerant against wrong charset names
18 years ago
theli e9afe39cbb *) Trying to be more tolerant against wrong charset names
18 years ago
theli 7526c831a8 *) Suppressing stracktrace
18 years ago
theli 22649408ad *) Better errorhandling for charset encoding problem during content parsing
18 years ago
orbiter 1969522dc1 removed lowercase of snippets (and other things):
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
theli 813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
theli 5c6251bced *) some improvements for extended html document charset support
18 years ago
orbiter f453c14b5d removed unreacheable catch blocks and unused imports
18 years ago
theli ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
allo 2fd610b556 http://www.yacy-forum.de/viewtopic.php?p=25611#25611
18 years ago
theli 06fa891152 *) htmlFilterContentScraper.java: using proper charset for document title
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli c5d3020941 *) better errorhandling for last commit
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
theli eb9b138986 *) next step of restructuring for new crawlers
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
orbiter 015d044c25 tried to fix some problems with latest changes to httpc
19 years ago
orbiter 47b541b2d1 added better option handling in yacysearch
19 years ago
orbiter 22de954a57 added some log output to parser
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter b21b9df2d0 added section headlines generation to html parser
19 years ago
theli 79667a172e *) Bugfix for additional parser problem
19 years ago
theli e7d16ef831 *) Corrections in jMimeMagic MagicRule-file to detect some special rss feeds
19 years ago
theli 5a1d45715d *) Bugfix for parser configuration bug
19 years ago
orbiter ec2b39c1ce code cleanup
19 years ago
theli 44fa94ac52 *) Modifications for dbImport functionality
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
theli 8ed0aaae8d *) Adding content Parser for RPM Files
19 years ago
theli bdf30117c1 *) Redesign of parser configuration
19 years ago
orbiter 40621a5663 anhancements in ranking preparation and fixed problem with parser/mime recognition
19 years ago