Commit Graph

9 Commits (d871812621554588abfc82a3d4087db0f1d4374c)

Author SHA1 Message Date
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
low012 2a6499364d *) minor changes
14 years ago
low012 c0274bd123 *) minor changes
14 years ago
orbiter 59b70a5a92 another fix to the ftp crawler: now correct directory listings according to rfc2640 (path with spaces) and better title names for such files
14 years ago
orbiter 7bdb13bf7f more fixes to smb crawling: better file names
14 years ago
orbiter c288fcf634 redesigned CrawlStartScanner user interface and added more features:
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago