- balancer writes cause of robots.txt in log file for crawl delay
- removed log output for forced GC
- smaller RAM flush for RWI cache, should cause more usage of cache and faster crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5228 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fixed a bug with merge method
- patched wrong output of language identification (not fixed, only patched!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5181 6c8d7289-2bf4-0310-a012-ef5d649a1542
- reduced maximum number of error url references to save some memory (this was actually a small memory leak)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5174 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added automatic time delay in case that a large number of urls come from the same domain
- added additional time delay in case that an url is a dynamic (CGI) url. This shall cause less IO on targets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5128 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fixed parsing of crawl-delay statements when seconds were given with float numbers
- enhanced performance of profiling (not too many loggings; not more than one per second)
- removed some debug output
- fixed wrong return type in logging
- added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!)
- fixed wrong word distance computation in RWI management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
- no more keep-order parameter in remove (it was not possible to make this strict, and not useful)
- some small enhancements in balancer
- robots parser without references in switchboard
- changes synchronization in robots
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542
the skin menue. Additionally an example is given there how to integrate a search page with an iframe.
Please see the skin menu.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4967 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added option to set minimum crawl delta for domains in balancer
- added default values to crawl deltas in yacy.init
- added configuration for these deltas in performance queues
- enhanced performance setting computation (more time for indexing queue for a faster flush
- remote crawling is now enabled during local crawling if indexer has space and time for more links
- added database stub for new distributed file system
- refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542