Consequently to the report in mantis 776
(http://mantis.tokeek.de/view.php?id=776).
Running the perfs test with different control parameters seems to reveal
that the YaCy's RowHandleMap used in the balancer depthCache is finally
more efficient than for example the ConcurrentHashMap from JDK 8.
When starting a crawl from a file containing thousands of links,
configuration setting "crawler.MaxActiveThreads" is effective to prevent
saturating the system with too many outgoing HTTP connections threads
launched by the crawler.
But robots.txt was not affected by this setting and was indefinitely
increasing the number of concurrently loading threads until most ot the
connections timed out.
To improve performance control, added a pool of threads for Robots.txt,
consistently used in its ensureExist() and massCrawlCheck() methods.
The Robots.txt threads pool max size can now be configured in the
/PerformanceQueus_p.html page, or with the new
"robots.txt.MaxActiveThreads" setting, initialized with the same default
value as the crawler.
with file:// protocol, 2 hostqueues accessing same cache file concurrently
http://mantis.tokeek.de/view.php?id=668
Reason seems to be diff. hosthash key of hostqueues on reopen.
Internal queue key and external representation (directoryname currently hostname.port) must be adjusted to fix it (not done yet).