yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	660a196989	refactoring	12 years ago
Michael Peter Christen	54024958ac	added url_file_name_s in qeury for live-search of urls	12 years ago
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	12 years ago
Michael Peter Christen	f542cf7d9c	fix for daterange: the to-date is inclusive	12 years ago
Michael Peter Christen	c36720d45f	added daterange option to gsa api	12 years ago
Michael Peter Christen	4e3007f4a0	typo	12 years ago
Michael Peter Christen	2cb6b6bc21	added target="_blank" to shutdown links	12 years ago
orbiter	c8e94ad7c7	fix for citation search in case that the citation is very fresh	12 years ago
orbiter	57dcf68665	added a feed-back message inside the shutdown page	12 years ago
Michael Peter Christen	0600d510e1	show the citation report also in ViewFile	12 years ago
Michael Peter Christen	1a92b61d69	fixed usage of ViewFile which needs a commit before showing latest crawl result pages.	12 years ago
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	12 years ago
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	12 years ago
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	12 years ago
Michael Peter Christen	2fd7bbb450	reduced load on solr; no seed update in Status and no exists-check in HTTPLoader in case of redirects, that can be done using the htcache.	12 years ago
Michael Peter Christen	7ee71c2354	changed administration page headline to 'admnistration'	12 years ago
Michael Peter Christen	efd973d29d	changed p2p/stealth mode text and links a bit	12 years ago
Michael Peter Christen	6115bef335	added a 'greedy learning' mechanismn which will cause that a 'fresh' yacy will load linked web pages from search results until the total number of web pages reaches 15000. This shall give fresh peers a 'boost' to get faster a personalized search index.	12 years ago
Michael Peter Christen	a5e328d7c5	new icons	12 years ago
Michael Peter Christen	b85db72a73	added another response writer which can present search result with texts, separated by sentences. Then, these sentences can be used to search again in the index for the same sentence. This can be used to provide a tool for plagiarism-search. (not finished yet). Try the following: http://localhost:8090/solr/select?q=text_t:flut&grep=wasser&defType=edismax&start=0&rows=3&core=collection1&wt=grephtml .. to search for 'flut' and show only sentences in the result documents which contain the word 'wasser'. Consider this like using a grep-tool on documents: you select the documents by a search query and you grep sentences inside the found documents with the 'grep' attribute.	12 years ago
Michael Peter Christen	5132bf719c	added new buttons to search result page in p2p mode which show the switch between p2p search and the 'stealth mode' which is simply a non-p2p search within the p2p network. The functionality was there all the time, but the switch to this was not very visible.	12 years ago
orbiter	2b320313d9	replaced yacydoc servlet usage by a solr result output using an html output writer. This made the creation of a html result writer necessary which is included in this commit. The yacydoc servlet was used to present all metadata to a document, but the solr interface can serve for this purpose in a much better way. All usages (instead one) of yacydoc were replaced by a solr call. This affects also the 'metadata' link attached to search results.	12 years ago
orbiter	200769d0c6	show the cache link in search results only if there is actually a cache entry stored in HTCACHE	12 years ago
Michael Peter Christen	f7e77a21bf	Added a citation reference computation for intra-domain link structures. While the values for the reference evaluation are computed, also a backlink-structure can be discovered and written to the index as well. The host browser has been extended to show such backlinks to each presented links. The host browser therefore can now show an information where an document is linked. The new citation reference is computed as likelyhood for a random click path with recursive usage of previously computed likelyhood. This process is repeated until the likelyhood converges to a specific number. This number is then normalized to a ranking value CRn, 0<=CRn<=1. The value CRn can therefore be used to rank popularity within intra-domain link structures.	12 years ago
Michael Peter Christen	fdcd4e6a6f	fixes to index deletion: quoting of host name (a '-' may be part of the url) and disabling the engage button when changing the url field at 'Delete by URL matching'	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
orbiter	5c7ddc67fe	in GSA api enable usage of solr fq-attribute together with GSA site-attribute	12 years ago
Michael Peter Christen	eb9d0ba5b1	ranking and boost function update, small bugfixes, better default search field for solr	12 years ago
Michael Peter Christen	5f92c68f1f	removed block rank ranking and all YBR files in /ranking	12 years ago
Michael Peter Christen	164603b946	cleanup	12 years ago
Michael Peter Christen	0c1a018bbd	removed 'later' tactic because it used too much RAM, reduced number of soft commits, reduced caching size of search events, ensured that solr results are processed before connection is closed to keep that stuff not too long in RAM	12 years ago
Michael Peter Christen	709e9b8ce7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	9e07447d47	added new link for SMW	12 years ago
Michael Peter Christen	3c04dd11de	removed dead link	12 years ago
Michael Peter Christen	281959a2d7	added option to re-boot the embedded solr during run-time. Added also API recording for this method so it can be repeated automatically. The index dump generation is now also available for API recording. Added some synchronization in backend which was necessary for this.	12 years ago
Michael Peter Christen	80a7989e8c	fixed ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.util.List; in robots.txt servlet	12 years ago
orbiter	da621e827e	prevent NPE in case RWI is disabled	12 years ago
Michael Peter Christen	7300d81f40	include API Table deletion requests to the API recorder	12 years ago
Michael Peter Christen	d2ade87b49	fixed missing thisaddress in yacysearch.html which caused that the opensearch link was not working	12 years ago
Michael Peter Christen	179d032181	added a (badly formatted) delete button for process scheduler entries	12 years ago
reger	c03f75ebc3	fix DHT url receive see http://bugs.yacy.net/view.php?id=242	12 years ago
Marc Nause	8fb1b1e290	*) simplified banner creation code	12 years ago
Marc Nause	cd0b5f31b4	*) updated links to description of regex	12 years ago
Michael Peter Christen	8f2d3ce2f9	reduced locking situation in crawler: shifted synchronized location and reduced time-out of robots.txt load limit	12 years ago
Michael Peter Christen	f93501e6e0	nice crawl name if crawl is started with file:// (was: null)	12 years ago
Michael Peter Christen	b4f0cac102	added the reindexing job servlet to the submenu structure	12 years ago
Michael Peter Christen	8dbc80da70	redesign of index.exist-test: this shall now not be done using a single id to be tested, but with a collection of ids. This will cause only a single call to solr instead of many. The result is a much better performace when testing the existence of many urls. The effect should cause very much less IO during index transmission, both on sender and receiver side.	12 years ago
Michael Peter Christen	c91c67c3cd	reject bad solr requests	12 years ago
Michael Peter Christen	44e363f37f	refactoring of WorkflowProcessor, added process counter, update of process counter if an blocking thread dies. Added also a new column in PerformanceConcurrency_p servlet to show the actual number of concurrent processes.	12 years ago
reger	79401cb938	added reindex option for documents with disabled or obsolete fields to Solr Schema Editor page (IndexSchema_p.html) this allows to remove obsolete fields from the index (according to current schema config) by selecting all documents containig disabled fields.	12 years ago

1 2 3 4 5 ...

4355 Commits (823ae4d6a770c55bd3ea6bb26c69536a3fe9f5cf)