and allow injection of recrawl urls before queue is empty
During recrawl the balancer hangs on the very last urls often on hosts with huge delay time,
by allowing injection earlier progress is more balanced. Max number of injected crawl urls by recrawl job is 2 * max loader.
@ -53,7 +54,7 @@ public class RecrawlBusyThread extends AbstractBusyThread {
privateStringcurrentQuery=CollectionSchema.fresh_date_dt.getSolrFieldName()+":[* TO NOW/DAY-1DAY]";// current query
privatebooleanincludefailed=false;// flag if docs with httpstatus_i <> 200 shall be recrawled
privateintchunkstart=0;
privateintchunksize=200;
privatefinalintchunksize;
finalSwitchboardsb;
privatefinalSet<DigestURL>urlstack;// buffer of urls to recrawl
publiclongurlsfound=0;
@ -70,6 +71,7 @@ public class RecrawlBusyThread extends AbstractBusyThread {
// workaround to prevent solr exception on existing index (not fully reindexed) since intro of schema with docvalues
// org.apache.solr.core.SolrCore java.lang.IllegalStateException: unexpected docvalues type NONE for field 'load_date_dt' (expected=NUMERIC). Use UninvertingReader or index with docvalues.