Commit Graph

36 Commits (a564df3984087c18cc3cdb49da92b93d0b874237)

Author SHA1 Message Date
orbiter 43c8defd79 enhanced parser with more extension + mime attributes
16 years ago
orbiter b2263bc720 enhanced document type recognition
16 years ago
lotus aa38eb5a20 * maxfilesize -1 for infinite filesize
16 years ago
lotus 9cfe89c8fc * process content-length as soon as it is received
16 years ago
lotus 9f083bb6b2 check filetype before loading (no more mp4 loading)
16 years ago
orbiter 57a88d435b redesign of parser mime type detection and parser steering
16 years ago
orbiter 21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter c0e8ed5461 fixed problem with not http client
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter 67aaffc0a2 - added Latency control to the crawler:
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter c12bb8a6d0 - refactoring of the http client
16 years ago
orbiter 024da2916b refactoring of logging
16 years ago
orbiter 1918a0173e added more exception handling during crawling
16 years ago
orbiter 674ad2d55b different handling of error cases that occur during loading files with http or ftp:
16 years ago
orbiter 826ca79735 refactoring and new architecture to store the files of the web cache:
17 years ago
orbiter 536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
17 years ago
orbiter 7989335ed6 Preparations to replace the HTCache with a new storage data structure:
17 years ago
danielr be28af50f5 - fixed "yacy2yacy no proxy"-problem
17 years ago
danielr a087090bbb fixed starting crawl results in "No parser available to parse mimetype 'application/octet-stream'"
17 years ago
danielr 17b7845eb5 * refactoring
17 years ago
danielr 3bb870bfcd added final where possible
17 years ago
orbiter c3d461d191 - removed superfluous copyright statement
17 years ago
orbiter 3ca98fee42 removed superfluous copyright statement
17 years ago
orbiter 1e6d12f146 Major update to BLOB data structures:
17 years ago
orbiter e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
17 years ago
danielr 63eadfdf84 fixed unlimited FileSizeLimit
17 years ago
danielr 7feae906aa - organize imports
17 years ago
orbiter 8be462986e fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1174&p=7841#p7841
17 years ago
orbiter cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
17 years ago
orbiter 1689030ee8 refactoring: moved all crawler classes into their own package
17 years ago