Commit Graph

717 Commits (ac492fa2a57a64db1a7300523d7bf1eceaffac6f)

Author SHA1 Message Date
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter b2263bc720 enhanced document type recognition
16 years ago
lotus aa38eb5a20 * maxfilesize -1 for infinite filesize
16 years ago
lotus 9cfe89c8fc * process content-length as soon as it is received
16 years ago
f1ori f814e0fa81 enable warnings and fix most of it
16 years ago
f1ori 8931c8d6b4 improvments to debianpackage:
16 years ago
orbiter 57a88d435b redesign of parser mime type detection and parser steering
16 years ago
orbiter 21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
16 years ago
orbiter 8ca1f5d400 - some work to integrate the html parser the same way as the other parsers are integrated (not finished)
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 409538e17a code cleanup and code simplifcation
16 years ago
orbiter 1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter 93dfb51fd4 problems with code style
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter b8e738a7be a collection of
16 years ago
orbiter db3a06dd81 removed cookie handling in httpc:
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter 4b4bddca00 added new submenu to crawler menu: import of phpbb3 forum postings from mysql
16 years ago
orbiter 709bfc2cd4 added a memory check in http post protocol
16 years ago
orbiter c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
16 years ago
orbiter 057ce14c8e more fixes (character encoding, parser exceptions, http client failure, blob writing)
16 years ago
orbiter d2ac0aa682 - fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
16 years ago
orbiter 16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
16 years ago
orbiter 8ffb9889e1 some fixes and performance hacks
16 years ago
orbiter d7cbf4cdd4 more performance hacks: less overhead in word hash computation
16 years ago
f1ori dd6b5005ff * fix missing charset handling in getpageinfo_p
16 years ago
orbiter c0e8ed5461 fixed problem with not http client
16 years ago
orbiter 57c00dd8c9 fix for bad filtering of common http error
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter 9bfb2641db - removed deprecated threads
16 years ago
orbiter b6c2167143 - patch for bad web structure dumps
16 years ago
orbiter 9a90ea05e0 added a merge operation for IndexCell data structures
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
borg-0300 0a2fabeef3 static TMPDIR
16 years ago
lotus f35dc11dc4 allow crawl start from pages with script tags
16 years ago
orbiter 858f800a07 more logging in httpd to detect shutdown cause. See also:
16 years ago
orbiter b80db04667 - refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
16 years ago
orbiter efcd95dc37 simplification of (internal) query process / refactoring
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter aca973e2d9 catch more exceptions
16 years ago
orbiter c12bb8a6d0 - refactoring of the http client
16 years ago
orbiter 6b450d09ca some fixes recommended by findbugs
16 years ago
orbiter f887fc159f try to reduce the large number of unclosed incoming connections
16 years ago
orbiter 333489420b - fix for NPE when loading the cytag image
16 years ago
orbiter e9a4182e6a using a concurrent hash map for the template cache
16 years ago
orbiter 01b97ef3f8 added new cybertag-tracking feature that was inspired by itgrl
16 years ago
orbiter b57c9da1f8 - fixes to doc, ppt, xls parser: better title
16 years ago
orbiter db510b5d52 more exception logging
16 years ago
low012 f136ddcfd4 *) this change is supposed to prevent the creation of temporary files by Apache Commons Fileupload library in cases where it is not necessary (as proposed by thq in http://forum.yacy-websuche.de/viewtopic.php?f=8&t=1806)
16 years ago
orbiter 94110df85a moved logging partially to kelondro
16 years ago
orbiter 024da2916b refactoring of logging
16 years ago
orbiter 83ce65707a (almost) completed partition of classes in kelondro
16 years ago
orbiter 7ee494fde5 more refactoring of kelondro:
16 years ago
orbiter bf93767ec6 refactoring of kelondro database classes
16 years ago
orbiter fc27bf8c4c refactoring of kelondro classes:
16 years ago
orbiter fe77fc3d62 - added new property setting 'repositoryPath'
16 years ago
f1ori aaafe05c02 * revert debug change
16 years ago
orbiter 335d6ce8fc fix for class loading problem
16 years ago
orbiter b423d0a036 moved all servlets from htroot/xml to htroot/api
16 years ago
orbiter 814a28775f removed thread dump writing in case of invocation target exception in httpd (looked bad, not serious)
16 years ago
low012 7608944081 *) bugfix for REMOTE_HOST environment variable in CGI code (shows hostname of client instead of hostname of YaCy peer now)
16 years ago
low012 c1330f5743 *) added environment variable DOCUMENT_ROOT
16 years ago
orbiter c6880ce28b removed the permanent cache flush and replaced it with a periodic cache flush
16 years ago
low012 afe98bc11c *) added changes as proposed by Halborinda in http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1674
16 years ago
low012 bb5c2cd12e *) ISINDEX parameters will not be put on commandline anymore to prevent possible security hazards (better safe than sorry). Parmeters will have to be read from QUERY_STRING in ISINDEX case too which does not seem to be uncommon behaviour for web servers: http://vms.pdv-systeme.de/users/martinv/cgi_basics/cgi_basics.html#Datenuebergabe
16 years ago
low012 db1cfae3e7 *) cleaning up after myself
16 years ago
low012 f547f9a78c *) added CGI capabilities (run Perl scripts and other software via HTTP GET and POST)
16 years ago
f1ori bdc380cd84 * add lastModified to templateCache
16 years ago
orbiter e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
16 years ago
f1ori 2d2ce24011 * remove all encoding-stuff from proxy
16 years ago
f1ori 73c8a0839c * abort download, when proxy connection is closed
16 years ago
f1ori 4907697cfa * make fileuploads through proxy bigger than 65500 bytes possible
16 years ago
orbiter db6b3bf5a3 speed enhancement for integrated http server:
16 years ago
orbiter 47292e696a more performance hacks
16 years ago
orbiter d39d420b39 performance hacks
16 years ago
orbiter 0b4808ba3d added new interactive search feature:
16 years ago
orbiter 74a3d86114 fixed a error response that might present classified information
16 years ago
danielr 2e63f03ca5 copy&paste vergessen :/
16 years ago
danielr cd8082b4e3 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111#p11166
16 years ago
f1ori d18c18971e * dirlisting in UTF-8 encoding
16 years ago
f1ori d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset
16 years ago
f1ori 90e78b2cf6 * improve encoding detection of http service
16 years ago
f1ori 7e1fe05e3c * added utf8-encoding to many getBytes-calls
16 years ago
f1ori 4b4ce75396 * http-server: submit charset from html metatags
16 years ago
f1ori d0543a7c39 * fix the debug ant-target
16 years ago
orbiter 0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
16 years ago
orbiter 6941bf42b1 performance hacks
16 years ago
orbiter 1778fb420d - added some performance tweaks to the new BLOB buffer
16 years ago
orbiter 826ca79735 refactoring and new architecture to store the files of the web cache:
16 years ago
lotus fe2792e9ce use accept-language header instead of user agent for language detection
16 years ago
orbiter ce2a7ed116 integrated language detection classes into condenser environment
16 years ago
orbiter 0cd0fee546 fixed bug with wrong proxy result enqueueing. See:
16 years ago
f1ori ba76995d2c * fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1415
16 years ago
danielr d60b2b198d proxy fixed 'not modified' http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1419
16 years ago
f1ori bd0318ba81 * YaCy only supports gzip-encoding, so remove any other encoding from request
16 years ago
danielr cf29ca19d4 possible fix for POST character encoding http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
16 years ago