Commit Graph

134 Commits (4f6658b1153e51c80fe3449a7656e52ecafbcb33)

Author SHA1 Message Date
karlchenofhell 6fbe31425a - some code-cleanup (no more syntax-warnings here)
18 years ago
orbiter f25c0e98d1 - replaced String by StringBuffer in condenser
18 years ago
allo 782db9099d version independent name for commons-pool lib
18 years ago
orbiter e4570bffaf -implemented a specialized snippet-fetch for media content
18 years ago
orbiter 937ccd4e76 fix for snippet-generation
18 years ago
orbiter ad1e4aa88e added selection of audio, video, image and application resources
18 years ago
orbiter ceb9e3aa17 - enhanced parser: collection of audio, video, image and application links
18 years ago
theli 92f774edd1 *) Better charset encoding detection
18 years ago
theli decb09df6d *) Trying to be more tolerant against wrong charset names
18 years ago
theli e9afe39cbb *) Trying to be more tolerant against wrong charset names
18 years ago
theli 7526c831a8 *) Suppressing stracktrace
18 years ago
theli 22649408ad *) Better errorhandling for charset encoding problem during content parsing
18 years ago
orbiter 1969522dc1 removed lowercase of snippets (and other things):
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
theli 813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
theli 5c6251bced *) some improvements for extended html document charset support
18 years ago
orbiter f453c14b5d removed unreacheable catch blocks and unused imports
18 years ago
theli ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
18 years ago
theli 97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
allo 2fd610b556 http://www.yacy-forum.de/viewtopic.php?p=25611#25611
18 years ago
theli 06fa891152 *) htmlFilterContentScraper.java: using proper charset for document title
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
theli c5d3020941 *) better errorhandling for last commit
18 years ago
theli d0a5a53789 *) changes needed for multi-language support
18 years ago
theli eb9b138986 *) next step of restructuring for new crawlers
18 years ago
theli f3ac4dbbb9 *) better handling of server shutdown
18 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
orbiter 015d044c25 tried to fix some problems with latest changes to httpc
19 years ago
orbiter 47b541b2d1 added better option handling in yacysearch
19 years ago
orbiter 22de954a57 added some log output to parser
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter b21b9df2d0 added section headlines generation to html parser
19 years ago
theli 79667a172e *) Bugfix for additional parser problem
19 years ago
theli e7d16ef831 *) Corrections in jMimeMagic MagicRule-file to detect some special rss feeds
19 years ago
theli 5a1d45715d *) Bugfix for parser configuration bug
19 years ago
orbiter ec2b39c1ce code cleanup
19 years ago
theli 44fa94ac52 *) Modifications for dbImport functionality
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
theli 8ed0aaae8d *) Adding content Parser for RPM Files
19 years ago
theli bdf30117c1 *) Redesign of parser configuration
19 years ago
orbiter 40621a5663 anhancements in ranking preparation and fixed problem with parser/mime recognition
19 years ago
theli c2fe3a1670 *) Updating jMimeMagic Ruleset
19 years ago
theli 445e3a620f *) Avoid rejecting of html content by the crawler when the file extension is not set properly
19 years ago
orbiter d2731418bf added creation of global ranking files and changed url normal form usage
19 years ago
theli b8ceb1ffde *) Adding better https support for crawler
19 years ago
hydrox cb69047b91 *)cleanup access static methods and fields
19 years ago
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
19 years ago
theli 0fd9aa6c6e *) Bugfix: supportedFileExt Function didn't detect the file extension correctly because of missing conversion to lower case
19 years ago
theli 8a33c9b309 *) Bugfix: supportedFileExt Function didn't detect the file extension correctly if there was a dot
19 years ago
theli 2b3f964037 *) Bugfix: supportedFileExt Function didn't chop http parameters before trying to detect the file extension
19 years ago
theli b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
19 years ago
theli 4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
19 years ago
theli 6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
19 years ago
rramthun 4cb382decb Adding changes by borg-0300 from http://www.yacy-forum.de/viewtopic.php?t=997
20 years ago
orbiter ba0a486328 moved printStackTrace() to logging
20 years ago
theli 470839a16a *) Crawler/Session pool settings will now be stored properly into configfile
20 years ago
orbiter 858cd94299 replaced indexing ram-queue by file-based stack-queue
20 years ago
orbiter 712fe9ef18 bugfixed utf-8 decoding and parser
20 years ago
theli 6697d5e52e *) correcting fkt. mediaExtContains
20 years ago
theli aae9a433a6 *) correcting usage of supportedFileExt-List
20 years ago
orbiter 1e7f062350 many bugfixes, memory leak fixes, performance enhancements; new kelondroHashtable; activated snippets
20 years ago
orbiter a25b5b4986 fixed possible memory leak in htmlScraper: be aware that now links can get lost; further work necessary
20 years ago
theli 9e47ba5ad6 *) adding missing calls for function close() to avoid "too many open file" bug
20 years ago
theli 9a98988c3c *) Bugfix for SSL/NIO Bug
20 years ago
theli 890e3f4d4a *) adding missing calls for function close() to avoid "too many open file" bug*) adding
20 years ago
theli 1b5ae054f8 *) changing reference to logger
20 years ago
theli 0484c41a84 *) replacing system.xxx.println with logging statements
20 years ago
theli 893a662329 *) Adding missing cast statement
20 years ago
theli 361f05978d Multiple updates regarding the yacy seedUpload facility,
20 years ago
theli 2aa5fe8f50 *) Import statements reorganized
20 years ago
theli 351c86d5d9 *) Migration of optional Content Parser integration
20 years ago
orbiter 48650c082c fixed 100%-CPU-Bug in plasmaCondenser
20 years ago
orbiter 995673d795 several bugfixes
20 years ago
theli f44b219e44 *) Eclipse has accidentally copied in the wrong file header into the new files (because these headers were accidentally set as default for the whole workspace instead of the project)
20 years ago
theli 58b1a0ba40 *) adding an new package for extra content parsers
20 years ago
orbiter 8b31f9e202 enhanced shut-down behaviour & added experimental nio-wrapper for kelondroRA (not active yet)
20 years ago
orbiter 00f223cfc1 fixed post-parsing (a case when the bluelist is empty)
20 years ago
orbiter e7d055b98e very experimental integration of the new generic parser and optional disabling of bluelist filtering in proxy. Does not yet work properly. To disable the disable-feature, the presence of a non-empty bluelist is necessary
20 years ago
orbiter a87a17a3c8 prepared generic text parser environment
20 years ago