jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.
The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
references_internal_url_sxt because they had been shown to be
superfluous. The citation of referrer in the host browser is possible
without them. Therefore now the host browser does not only show
internal, but also external referrer to each link.
explored for citations within other documents. A click on the
'Citations' link shows an analysis with all text lines in the document
each with a complete list of documents which contain the same line. A
second section shows the linking documents in ascending order of number
of citations from the original document. Because documents from
different hosts are most interesting here, they are listed at the top of
the page as possible 'copypasta' source.
yacy will load linked web pages from search results until the total
number of web pages reaches 15000. This shall give fresh peers a 'boost'
to get faster a personalized search index.
texts, separated by sentences. Then, these sentences can be used to
search again in the index for the same sentence. This can be used to
provide a tool for plagiarism-search. (not finished yet).
Try the following:
http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml
.. to search for 'flut' and show only sentences in the result documents
which contain the word 'wasser'.
Consider this like using a grep-tool on documents: you select the
documents by a search query and you grep sentences inside the found
documents with the 'grep' attribute.
switch between p2p search and the 'stealth mode' which is simply a
non-p2p search within the p2p network. The functionality was there all
the time, but the switch to this was not very visible.
output writer. This made the creation of a html result writer necessary
which is included in this commit. The yacydoc servlet was used to
present all metadata to a document, but the solr interface can serve for
this purpose in a much better way. All usages (instead one) of yacydoc
were replaced by a solr call. This affects also the 'metadata' link
attached to search results.