It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4151 6c8d7289-2bf4-0310-a012-ef5d649a1542
- start of new development cycle.
in case your don't know: commits until 0.553 will not automatically be used be the auto-update funktion
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4146 6c8d7289-2bf4-0310-a012-ef5d649a1542
in extreme situations this will cause that no remote crawls are send out any more
this is bad, but it protects the case where failing remote crawls fill up the local queue too much,
which is even worse
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4141 6c8d7289-2bf4-0310-a012-ef5d649a1542
- URL encoding for search terms where required
- removed "ugly" CDATA escaping
- UTF-8 encoding for the XML
- no HTML style escaping for XML/RSS element values
Note: some unicode characters might still be encooded in a wrong way.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4140 6c8d7289-2bf4-0310-a012-ef5d649a1542
there are now two time-outs, one for the complete connection time, and one for an idle time
connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed
if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed
During normal operation of peers these forced closings should never appear,
but the existence of the idle connection check ensures the availability of the peer and the usability of the host.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4134 6c8d7289-2bf4-0310-a012-ef5d649a1542
Note: the new DateFormatter822 in the plasmaSwitchboard is just a copy of the DateFormatter that always uses the US locale to allow formatting of a loocale independent date String.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4124 6c8d7289-2bf4-0310-a012-ef5d649a1542
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
there is no waiting until the local search terminates to show the result page.
the local search appear like all other results from remote peers using a separated thread.
This has especially a stron effect, if the local index for a specific word is large.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4114 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown
attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added a file size limitation, that disallows parsing of large documents during (offline-) remote search
- added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4112 6c8d7289-2bf4-0310-a012-ef5d649a1542