yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	b9d36e45e0	removed the &amp explicit encoding of ampersand character since this is double-translated within the template replacement process.	11 years ago
orbiter	dcf46ce8f6	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	343d2ef49a	new data type for access tracker (unfinished)	11 years ago
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	11 years ago
reger	abbf487023	fix QueryGoal Image query (missing space) see query log example .. url_file_ext_s:(jpg OR png OR gif) ORcontent_type:(image/*)) ..	11 years ago
reger	26e9d7e066	fix NPE in IndexControlRWIs_p.html - metatags my be null Caused by: java.lang.NullPointerException at net.yacy.search.query.QueryParams.getFacets(QueryParams.java:445) at net.yacy.search.query.QueryParams.getBasicParams(QueryParams.java:400) at net.yacy.search.query.QueryParams.solrTextQuery(QueryParams.java:345) at net.yacy.search.query.QueryParams.solrQuery(QueryParams.java:334) at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:290) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:176) at IndexControlRWIs_p.genSearchresult(IndexControlRWIs_p.java:641) at IndexControlRWIs_p.respond(IndexControlRWIs_p.java:141)	11 years ago
orbiter	3961b643a3	write solr searches to search log	11 years ago
Michael Peter Christen	25f9c35033	add patch which shall prevent that naive search mistakes like usage of regular expressions cause no results. Usage of '*' followed by a dot or any expression will now cause that this expression is used as a filetype search.	11 years ago
Michael Peter Christen	25250405f1	solr servlet preparation for join with jetty branch	11 years ago
Michael Peter Christen	09412ea3a4	counting search requests in solr interface	11 years ago
Michael Peter Christen	78eac85161	better calibration of caches and queue maximum sizes	11 years ago
Michael Peter Christen	c8af19bd37	removed unnecessary check which causes a NPE when searching with empty search string	11 years ago
Michael Peter Christen	6f3a923691	fixed urlmask which was not able to combine several constraints	11 years ago
Michael Peter Christen	ae55d69ef6	include/exclude size NPE fix (recently added)	11 years ago
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	11 years ago
orbiter	61409788eb	less word hash computations (removing some overhead because of MD5 calcs) using the clear word in a normalized form.	11 years ago
Michael Peter Christen	bf1bdd52a6	prevent requesting of 0-facets (which actually exist)	11 years ago
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	11 years ago
Michael Peter Christen	69b8d61c47	fix for search requests in GSA interface which contain 'funny' characters (like ':' etc.)	11 years ago
reger	7b17cdf6dd	add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type adresses also possible url without or deviating extension.	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
Michael Peter Christen	78e7aadb26	removed unused initialization method	11 years ago
Michael Peter Christen	4fbc4740df	removed warnings	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
Michael Peter Christen	85456f46b2	added two new fields, exact_signature_copycount_i and fuzzy_signature_copycount_i, which count the number of copies of non-unique documents and assigns this to each document. Thus, each document there is a number assigned which shows how many copies of this document exists. These fields are disabled by default.	11 years ago
Michael Peter Christen	a2511b5600	turned images_alt_txt back to images_alt_sxt because it is not necessary to index the alt text. Indexed image Text is in images_text_t	11 years ago
Michael Peter Christen	85b1922244	activated image type navigation for image search	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	ab1201fdfd	fixed wrong facet count	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	a8c5bfcf58	avoid to create unnecessary objects	11 years ago
Michael Peter Christen	dc179bd61f	fix for catchall query goal for image search	11 years ago
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	11 years ago
Michael Peter Christen	169ef8963d	one more fix for image search	11 years ago
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	11 years ago
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	11 years ago
orbiter	f106345eef	link strings should not be tokenized	11 years ago
reger	a5019bc470	make Vocabulary Navigator tags a hard result entry filter by checking vocabulary tags also for rwi results (currently a filter is applied to the solr query) TODO: as vocabularies are only locally valid, auto-switch to Searchdom.LOCAL could be considered.	11 years ago
reger	a67a4b7d86	improve tld: query modifier filter pattern (to prevent tld:net accepting www.abcinet.org)	11 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Michael Peter Christen	a2c8116a8f	accept (but ignore) a '+' sign in front of search words	12 years ago
sixcooler	d5d8936f9d	For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method	12 years ago
Michael Peter Christen	32aa1d4569	removed unused option for queries	12 years ago
Michael Peter Christen	8caaf6203a	fixed false multiple-generation of remote facet search which caused high cpu usage on remote side.	12 years ago
reger	d367b1f4d9	add null pointer check to stopword fix	12 years ago
reger	7480e87386	- fix stopword handling for RWI see example http://bugs.yacy.net/view.php?id=247 - append language setting specific stopword list - remove unused OVERHANG stack type	12 years ago
Michael Peter Christen	409d6edf53	Store node/solr search threads to be able to send them an interrupt signal in case that a cleanup process wants to remove the search process. Added also a new cleanup process which can reduce the number of stored searches to a specific number which can be higher or lower according to the remaining RAM. The cleanup process is called every time a search ist started.	12 years ago

1 2 3 4 5 ...

285 Commits (b9d36e45e083c6ad235ca4b0f842aaab4407d6a6)