Commit Graph

86 Commits (c659310e89e8a4b4b2d1de0b13c67f604843ad1e)

Author SHA1 Message Date
orbiter e4a82ddd8b produce a bookmark entry from every crawl start. these bookmarks are always private.
13 years ago
orbiter 97d1347adb added also a default accept field to robots.txt downloads
13 years ago
orbiter 06352b8d6b more logging
13 years ago
orbiter a99934226e more logging for debugging of robots.txt
13 years ago
orbiter 458c20ff72 fix for robot parser
13 years ago
orbiter 017a01714d - enhanced logging in robots.txt parser for remote debugging
13 years ago
orbiter 775b44017e refactoring
13 years ago
orbiter 10e2f588f8 - enhanced ybr ranking computation
14 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
orbiter d8e934c085 better abstraction of http client identification
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
14 years ago
f1ori efcf37a953 * show info in log, if robots.txt is rejected due to wrong mime-type
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter d2fd93135c - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed
14 years ago
sixcooler 17eebd4ef8 counting crawler traffic again:
14 years ago
orbiter 22047ffad5 enhanced computation speed of many replaceAll string operations
14 years ago
orbiter c60d0282fd more abstraction for tables stored in heaps:
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
14 years ago
orbiter 7aa860c505 - more logging
14 years ago
sixcooler a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
14 years ago
sixcooler 15e8c13526 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter b03caaa57a better handling of OOM situations
15 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter c855fc48c6 only load robots.txt for http and http protocol
15 years ago
orbiter 727dd9b193 - fixed a bug in robots.txt parser
15 years ago
orbiter 5df628a2a4 - added BEncoder class
15 years ago
orbiter bc96d74813 - clean-up of robots.txt parser
15 years ago
orbiter 362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter e7f18ba24b refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter c0e0e1f422 moved blob classes
15 years ago
orbiter 4446acc8cd moved kelondro order
15 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
f1ori f814e0fa81 enable warnings and fix most of it
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter ae015e8e98 refactoring of blob package classes
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter b8e738a7be a collection of
16 years ago
orbiter 3d4b826ca5 migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically.
16 years ago
shostakovich 1f37cc6107 Robots.txt is now reused after one day. See forum-topic:
16 years ago