Commit Graph

14 Commits (49ffedfd8b80ae1ac8d720cfb280bf8ecaa19c66)

Author SHA1 Message Date
orbiter 61798f0ae6 added option to distinguish between text crawl and media crawl
18 years ago
orbiter bb7d4b5d5e refactoring to prepare new RWI entry object
18 years ago
orbiter b79e06615d - added new LURL.Entry class for next database migration
18 years ago
orbiter a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
18 years ago
theli f17ce28b6d *) plasmaHTCache:
18 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
theli b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
18 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
orbiter d0dd8b14d2 fixed picture tag and presentation
19 years ago
orbiter 47b541b2d1 added better option handling in yacysearch
19 years ago
orbiter c9e16bfd48 first try to insert image search (does not work yet)
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago