Commit Graph

664 Commits (b2263bc720c473854b22c2fccb8c124cff0f7a81)

Author SHA1 Message Date
orbiter b2263bc720 enhanced document type recognition
16 years ago
lotus aa38eb5a20 * maxfilesize -1 for infinite filesize
16 years ago
lotus 9cfe89c8fc * process content-length as soon as it is received
16 years ago
f1ori f814e0fa81 enable warnings and fix most of it
16 years ago
f1ori 8931c8d6b4 improvments to debianpackage:
16 years ago
orbiter 57a88d435b redesign of parser mime type detection and parser steering
16 years ago
orbiter 21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
16 years ago
orbiter 8ca1f5d400 - some work to integrate the html parser the same way as the other parsers are integrated (not finished)
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 409538e17a code cleanup and code simplifcation
16 years ago
orbiter 1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter 93dfb51fd4 problems with code style
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter b8e738a7be a collection of
16 years ago
orbiter db3a06dd81 removed cookie handling in httpc:
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter 4b4bddca00 added new submenu to crawler menu: import of phpbb3 forum postings from mysql
16 years ago
orbiter 709bfc2cd4 added a memory check in http post protocol
16 years ago
orbiter c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
16 years ago
orbiter 057ce14c8e more fixes (character encoding, parser exceptions, http client failure, blob writing)
16 years ago
orbiter d2ac0aa682 - fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
16 years ago
orbiter 16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
16 years ago
orbiter 8ffb9889e1 some fixes and performance hacks
16 years ago
orbiter d7cbf4cdd4 more performance hacks: less overhead in word hash computation
16 years ago
f1ori dd6b5005ff * fix missing charset handling in getpageinfo_p
16 years ago
orbiter c0e8ed5461 fixed problem with not http client
16 years ago
orbiter 57c00dd8c9 fix for bad filtering of common http error
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter 9bfb2641db - removed deprecated threads
16 years ago
orbiter b6c2167143 - patch for bad web structure dumps
16 years ago
orbiter 9a90ea05e0 added a merge operation for IndexCell data structures
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
borg-0300 0a2fabeef3 static TMPDIR
16 years ago
lotus f35dc11dc4 allow crawl start from pages with script tags
16 years ago
orbiter 858f800a07 more logging in httpd to detect shutdown cause. See also:
16 years ago
orbiter b80db04667 - refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
16 years ago
orbiter efcd95dc37 simplification of (internal) query process / refactoring
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter aca973e2d9 catch more exceptions
16 years ago
orbiter c12bb8a6d0 - refactoring of the http client
16 years ago
orbiter 6b450d09ca some fixes recommended by findbugs
16 years ago
orbiter f887fc159f try to reduce the large number of unclosed incoming connections
16 years ago
orbiter 333489420b - fix for NPE when loading the cytag image
16 years ago
orbiter e9a4182e6a using a concurrent hash map for the template cache
16 years ago
orbiter 01b97ef3f8 added new cybertag-tracking feature that was inspired by itgrl
16 years ago
orbiter b57c9da1f8 - fixes to doc, ppt, xls parser: better title
16 years ago
orbiter db510b5d52 more exception logging
16 years ago
low012 f136ddcfd4 *) this change is supposed to prevent the creation of temporary files by Apache Commons Fileupload library in cases where it is not necessary (as proposed by thq in http://forum.yacy-websuche.de/viewtopic.php?f=8&t=1806)
16 years ago