Commit Graph

15 Commits (5b3acc12cd4b4343c4e7d7f0a20a1da8ea8d5f6a)

Author SHA1 Message Date
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package
13 years ago
orbiter d2ea250d99 refactoring:
14 years ago
orbiter 610b01e1c3 - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
14 years ago
orbiter 091dd3f6ec - enhanced intranet search speed
15 years ago
orbiter 48c0d508ac fixes for crawling of smb links (file length not always available)
15 years ago
orbiter 65eaf30f77 redesign of crawl profiles data structure. target will be:
15 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
15 years ago
orbiter 5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
15 years ago
orbiter 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
15 years ago
orbiter a82a93f2fc - better url double check in crawler
15 years ago
orbiter 150cf42a1b migrated all my LGPL 3 -licensed files to the LGPL 2.1 because LGPL 3 is not compatible to the GPL 2
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago