yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	1c62fa7698	fix for bad snippets in gsa api	11 years ago
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	11 years ago
orbiter	d38c3c14d8	fix for CGI test	11 years ago
Michael Peter Christen	f13df9dbb6	migration to solr 4.4.0	11 years ago
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	11 years ago
Michael Peter Christen	83e2921b39	new test case for http://bugs.yacy.net/view.php?id=141	11 years ago
Michael Peter Christen	304aacb2cc	fix for http://bugs.yacy.net/view.php?id=267	11 years ago
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	11 years ago
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	11 years ago
Michael Peter Christen	336f86394c	replaced StringBuffer with StringBuilder	11 years ago
Michael Peter Christen	31483c47e1	fixed problem with remote luke requests	11 years ago
Michael Peter Christen	ac1aad5064	added a getSegmentCount method and use it to disable optimize if wanted current segment count is below optimization level	11 years ago
Michael Peter Christen	36035e0a0a	- used reger's LukeRequest to generalize the index info in SolrServerConnector - used the LukeRequest in SolrServerConnector to replace the index size method by a getNumDocs request to a LukeRequest result	11 years ago
Michael Peter Christen	39fceb5ccf	fix for NPE & bug #264	11 years ago
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
orbiter	b71d13a014	added load and deadlock detector in Memory util	12 years ago
orbiter	5533fc8e01	fix for bug 260	12 years ago
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	12 years ago
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	12 years ago
Michael Peter Christen	9a29ab469e	another patch to prevent CLOSE_WAIT status on solr connections	12 years ago
Michael Peter Christen	87e9052081	added Connection:close to all http requests in our http client to prevent CLOSE_WAIT states (as seen in lsof)	12 years ago
Michael Peter Christen	5c6946dd5f	replaced usage of log4j by ConcurrentLog where possible	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	f4f6551c66	better handling of time-out at solrj in case that a commit is done in a fail-over case during add	12 years ago
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	12 years ago
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	12 years ago
sixcooler	308d73f855	do not use remote proxy if not switched on - regardless of the proto	12 years ago
sixcooler	69906b1d2e	Revert "do not use remote proxy if not switched on - regardless of the proto" This reverts commit `20f452d228`.	12 years ago
sixcooler	20f452d228	do not use remote proxy if not switched on - regardless of the proto	12 years ago
sixcooler	d5d8936f9d	For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method	12 years ago
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
Michael Peter Christen	f9d859f5dc	now writing image alt texts and (camelcase-)parsed urls into a text search field for a better image retrieval	12 years ago
orbiter	97f2ac9091	added hint to gsa response writer that the result comes from a yacy peer	12 years ago
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	fc3ff92c69	npe fix	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	b85db72a73	added another response writer which can present search result with texts, separated by sentences. Then, these sentences can be used to search again in the index for the same sentence. This can be used to provide a tool for plagiarism-search. (not finished yet). Try the following: http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml .. to search for 'flut' and show only sentences in the result documents which contain the word 'wasser'. Consider this like using a grep-tool on documents: you select the documents by a search query and you grep sentences inside the found documents with the 'grep' attribute.	12 years ago
orbiter	2b320313d9	replaced yacydoc servlet usage by a solr result output using an html output writer. This made the creation of a html result writer necessary which is included in this commit. The yacydoc servlet was used to present all metadata to a document, but the solr interface can serve for this purpose in a much better way. All usages (instead one) of yacydoc were replaced by a solr call. This affects also the 'metadata' link attached to search results.	12 years ago
Michael Peter Christen	f7e77a21bf	Added a citation reference computation for intra-domain link structures. While the values for the reference evaluation are computed, also a backlink-structure can be discovered and written to the index as well. The host browser has been extended to show such backlinks to each presented links. The host browser therefore can now show an information where an document is linked. The new citation reference is computed as likelyhood for a random click path with recursive usage of previously computed likelyhood. This process is repeated until the likelyhood converges to a specific number. This number is then normalized to a ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to rank popularity within intra-domain link structures.	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
reger	9ef1fd9bac	fix: enable use of solrcore.properties for property substitution of solrconfig.xml	12 years ago
reger	8a7fcb391d	enable use of solrcore.properties for property substitution of solrconfig.xml - move setting of system property solr.directoryFactory=solr.MMapDirectoryFactory to solrcore.properties - add check of os.arch for 64bit system, if it fails use default/solrcore.x86.properties (if exists) as solrcore.properties reason: on 32bit MMapDirectoryFactory may fail with..... Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)	12 years ago
Michael Peter Christen	ba793a32c0	added timeout for remote searches of 10 seconds	12 years ago
Michael Peter Christen	1c4c1c0345	try to commit in case of failure which hopefully frees up some RAM	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
Michael Peter Christen	67757b425a	use a retry handler with retryCount=0 because we usually expect requests to fail if we access non-permanently available resources (peers, web pages) and want to fail fast without repeating the same request which is doomed to fail. The previous appearance of http client connection had a 1-2-4-8-second timeout scheme, which caused that connection attempts lasted for 16 seconds.	12 years ago
Michael Peter Christen	8dbc80da70	redesign of index.exist-test: this shall now not be done using a single id to be tested, but with a collection of ids. This will cause only a single call to solr instead of many. The result is a much better performace when testing the existence of many urls. The effect should cause very much less IO during index transmission, both on sender and receiver side.	12 years ago

1 2 3 4 5 ...

678 Commits (e6b423c4d99cb07ec88bb2c7e49de8ea5c8bfd5e)