Commit Graph

105 Commits (e2e7f065e934c266e1a49b0792b69e68118dc784)

Author SHA1 Message Date
orbiter efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
17 years ago
orbiter f4e9ff6ce9 more generics
17 years ago
orbiter ecd7f8ba4e - added NEAR operator (must be written in UPPERCASE in search query)
17 years ago
low012 b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660)
17 years ago
orbiter af10f729df fixed image search and favicon loading
17 years ago
orbiter 4fefa53135 removed parser object pool, see also svn 4106
18 years ago
orbiter 842308ea97 - redesigned crawl start menu, integrated monitoring pages
18 years ago
orbiter 11b4f80bde - fixed non-closing client connections
18 years ago
orbiter 1488769e1f cleanup of unmaintained and outdated performance methods:
18 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
18 years ago
orbiter 57a5b6fa71 some generalization of remote proxy configuration and setting handling in httpc
18 years ago
orbiter 40b0547611 - documentaton changes (removed old forum links)
18 years ago
orbiter 557f8d80e4 - better logging
18 years ago
orbiter 26ddf797eb added bmp and ico image format to all parser/viewing methods
18 years ago
orbiter 6518bb6c08 changed release strategy:
18 years ago
theli 339153d40e *) favicons that are specified in the document content via html link-tags
18 years ago
rramthun 18a5380ee3 *) situation-dependent lock-buttons for search-page
18 years ago
karlchenofhell 0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
18 years ago
orbiter 871ee1ce0f one step closer to automatic updates:
18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
karlchenofhell 6fbe31425a - some code-cleanup (no more syntax-warnings here)
18 years ago
orbiter f25c0e98d1 - replaced String by StringBuffer in condenser
18 years ago
allo 782db9099d version independent name for commons-pool lib
18 years ago
orbiter e4570bffaf -implemented a specialized snippet-fetch for media content
18 years ago
orbiter 937ccd4e76 fix for snippet-generation
18 years ago
orbiter ad1e4aa88e added selection of audio, video, image and application resources
18 years ago
orbiter ceb9e3aa17 - enhanced parser: collection of audio, video, image and application links
18 years ago
theli 92f774edd1 *) Better charset encoding detection
19 years ago
theli decb09df6d *) Trying to be more tolerant against wrong charset names
19 years ago
theli e9afe39cbb *) Trying to be more tolerant against wrong charset names
19 years ago
theli 7526c831a8 *) Suppressing stracktrace
19 years ago
theli 22649408ad *) Better errorhandling for charset encoding problem during content parsing
19 years ago
orbiter 1969522dc1 removed lowercase of snippets (and other things):
19 years ago
theli f17ce28b6d *) plasmaHTCache:
19 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
19 years ago
theli cd5f349666 *) Better handling of large files during parsing
19 years ago
theli 813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
19 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
19 years ago
theli 5c6251bced *) some improvements for extended html document charset support
19 years ago
orbiter f453c14b5d removed unreacheable catch blocks and unused imports
19 years ago
theli ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
19 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
19 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
19 years ago
allo 2fd610b556 http://www.yacy-forum.de/viewtopic.php?p=25611#25611
19 years ago
theli 06fa891152 *) htmlFilterContentScraper.java: using proper charset for document title
19 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
19 years ago
theli c5d3020941 *) better errorhandling for last commit
19 years ago
theli d0a5a53789 *) changes needed for multi-language support
19 years ago
theli eb9b138986 *) next step of restructuring for new crawlers
19 years ago