Commit Graph

42 Commits (9fb5d661f2863fb26c45c8bb14241383269b8205)

Author SHA1 Message Date
orbiter 9b0e20fb06 next refactoring step in document indexing to prepare concurrency environment for document parsing
17 years ago
orbiter 8d6a13bc07 refactoring of parsing-condensing-indexing process:
17 years ago
orbiter 87a8747ce3 - enhanced recognition, parsing, management and double-occurrence-handling of image tags
17 years ago
orbiter efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
17 years ago
orbiter 45339c3db5 more generics
17 years ago
low012 b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660)
17 years ago
orbiter daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
17 years ago
orbiter 40b0547611 - documentaton changes (removed old forum links)
18 years ago
theli 339153d40e *) favicons that are specified in the document content via html link-tags
18 years ago
karlchenofhell 0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
18 years ago
orbiter 6b9eea3932 - removed differentiation between longTitle and shortTitle; this cannot be used for search results,
18 years ago
orbiter a738b57b31 added author tag to indexing content
18 years ago
orbiter 937ccd4e76 fix for snippet-generation
18 years ago
orbiter bf0d820659 - added correct flagging of word properties
18 years ago
orbiter ad1e4aa88e added selection of audio, video, image and application resources
18 years ago
orbiter ceb9e3aa17 - enhanced parser: collection of audio, video, image and application links
18 years ago
orbiter ba967c4875 - bugfixes and debug code
18 years ago
orbiter 1969522dc1 removed lowercase of snippets (and other things):
18 years ago
theli a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
18 years ago
theli cd5f349666 *) Better handling of large files during parsing
18 years ago
orbiter df1629b05a - code cleanup
18 years ago
orbiter 3aac5b26da - added automatic tag generation when a web page from the search results is added
18 years ago
theli 74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
18 years ago
orbiter abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
19 years ago
orbiter 3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
19 years ago
orbiter f77775220b fixed parser error
19 years ago
orbiter 83e0e765ec redesigned some parts of the html scanner & parser
19 years ago
orbiter 6b63e26cbb - removed search function from index.html/java, only imput left
19 years ago
orbiter 37f88b4017 code cleanup
19 years ago
theli 44fa94ac52 *) Modifications for dbImport functionality
19 years ago
orbiter 3d8a5ae652 code cleanup
19 years ago
orbiter d2731418bf added creation of global ranking files and changed url normal form usage
19 years ago
theli 9b7f37fc37 *) Minor changes
19 years ago
theli 6c722706b7 *) Moving yacyDebugMode intialization to switchboard
19 years ago
orbiter d8fdc2526e added experimental snipplet-generation (to be disabled for 0.38)
20 years ago
theli 361f05978d Multiple updates regarding the yacy seedUpload facility,
20 years ago
orbiter 1d7fed87dc redesign of index caching - removed indexCache.db
20 years ago
theli 351c86d5d9 *) Migration of optional Content Parser integration
20 years ago
orbiter 995673d795 several bugfixes
20 years ago
theli fd584c113c *) some minor changes
20 years ago
theli f44b219e44 *) Eclipse has accidentally copied in the wrong file header into the new files (because these headers were accidentally set as default for the whole workspace instead of the project)
20 years ago
theli 58b1a0ba40 *) adding an new package for extra content parsers
20 years ago