0-values and no empty strings are written). This may save a lot of
memory (in ram and on disc) if excessive 0-values or empty strings
appear)
- do not allow default boolean values for checkboxes because that does
not make sense: browsers may omit the checkbox attribute name if the box
is not checked. A default value 'true' would not comply with the
semantic of the browsers response.
- add a checkbox in IndexFederated_p for the lazy initialization of solr
fields.
computation. This error was there before the latest IPv6 hack but did
not cause a NPE. The IPv6 hack was not the cause for this bug, but it
discovered the misconfiguration of the 'referer' referrer.
- do not replace malformed or invalid URLs in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835
6c8d7289-2bf4-0310-a012-ef5d649a1542
Conflicts:
source/de/anomic/http/server/HTTPDFileHandler.java
rewrite engine to customize existing webpages. originally implemented by
Florian Richter.
Conflicts:
source/de/anomic/http/server/HTTPDProxyHandler.java
not equal. It might happen that these data is different because one of
that caches is cleaned after a while or when it is too big. The metadata
is then not cleaned, but now wiped after a checkup process at every
application start. This should cause a bit less memory usage.
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
check semantically wrong, but a trick that prevented an IP lookup in
case that the filter was not used did not work. That bugfix causes that
crawling gets a huge speed boost for noload urls!
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
and the following test programm:
public class QueueLengthTimeTest {
public static long countTest(Queue<Integer> q, int c) {
long t = System.currentTimeMillis();
for (int i = 0; i < c; i++) {
q.add(q.size());
}
return System.currentTimeMillis() - t;
}
public static void main(String[] args) {
int c = 1;
for (int i = 0; i < 100; i++) {
Runtime.getRuntime().gc();
long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
Runtime.getRuntime().gc();
long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
Runtime.getRuntime().gc();
long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);
System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
c = c * 2;
}
}
}
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)