yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	321840fde3	Replaced all fixed thread pools with cached thread pools. The cached thread pools will flush their cached (dead) threads after 60 seconds. This will cause that YaCy now runs constantly withl about 50 threads, about 100 at peak times. Previously, about 400 threads had been cached and kept in a hibernation state, which caused that the numproc counter in /proc/user_beancounters (exists only in VM-hosted linux) was as high as the cached number of threads. This caused that VM supervisors terminated whole VM sessions if a limit was reached. Many VM providers have limits of numproc=96 which made it virtually impossible to run YaCy on such machines. With this change, it will be possible to run many YaCy instances even on VM hosts.	10 years ago
Michael Peter Christen	181911376c	showing list of all thread in threaddump using the ThreadMXBean counter (this obviously show more threads than before?)	10 years ago
Michael Peter Christen	7bfab5eb9d	set Busy- and Blocking-Threads to daemon mode (they will now not prevent YaCy from termination if still running)	10 years ago
Michael Peter Christen	64887f6b21	show number of threads on status page	10 years ago
Michael Peter Christen	e586e423aa	in case that loading from the cache fails, load from wkhtmltopdf without cache using the user agent string given in the crawl profile	10 years ago
Michael Peter Christen	d5bac64421	recognize more html file types for snapshots	10 years ago
Michael Peter Christen	6f0167fac1	get cloned crawl start parameter for snapshots	10 years ago
Michael Peter Christen	a1ee101079	recognize more html file extensions	10 years ago
Michael Peter Christen	8480641f2d	fix to xvfb-run usage (quotes did not parse in xvfb-run, default values are appropriate)	10 years ago
Michael Peter Christen	68b040e31e	added fail-over missing http proxy service (i.e. overload) and quiet mode	10 years ago
Michael Peter Christen	25a64c51b3	moved snapshot generation out of the html handler to prevent that existing cache entries cause that the handler is not executed	10 years ago
Michael Peter Christen	c35170a305	more logging	10 years ago
Michael Peter Christen	e8be07ec78	grr	10 years ago
Michael Peter Christen	6f81bb756c	wrap wkhtmltopdf with xvfb if necessary	10 years ago
Michael Peter Christen	0119f8665d	more logging when failing to create pdf snapshot	10 years ago
Michael Peter Christen	416fe886e3	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	60f27bdf49	added the property timeoutrequests to configuration to disable TimeoutRequests. The purpose is to test if YaCy runs better on VMs where there is a limitation of concurrent processes; see /proc/user_beancounters in row numproc; this value is limited and should be low. Try to set timeoutrequests to keep this low. (works only after restart)	10 years ago
Michael Peter Christen	97f6089a41	YaCy can now create web page snapshots as pdf documents which can later be transcoded into jpg for image previews. To create such pdfs you must do: Add wkhtmltopdf and imagemagick to your OS, which you can do: On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from http://wkhtmltopdf.org/downloads.html and downloadh ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip In Debian do "apt-get install wkhtmltopdf imagemagick" Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and "Always Fresh" - this is used by wkhtmltopdf to fetch web pages using the YaCy proxy. Using "Always Fresh" it is possible to get all pages from the proxy cache. Finally, you will see a new option when starting an expert web crawl. You can set a maximum depth for crawling which should cause a pdf generation. The resulting pdfs are then available in DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf	10 years ago
Michael Peter Christen	41d00350e4	moved network configuration to Use Case submenu; this is necessary because the definiton of portal peers within the YaCy freeworld network is otherwise splitted into two different main menus.	10 years ago
reger	ff80700aff	replace depreciated Solr DateField.formatExternal with recommended TrieDateField.formatExternal	10 years ago
Michael Peter Christen	9ea120dbe5	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
reger	aa7122f079	update to guava.18.0.jar and jsch.0.1.51.jar	10 years ago
reger	0c97cc2440	skip unused call parameter for hashSentence()	10 years ago
reger	221f86dd5e	position api icon (ViewFile.html)	10 years ago
reger	4c14a8b44d	update to poi-3.10.1.jar	10 years ago
reger	ea633a794c	including small junit test case for WordTokenizer	10 years ago
reger	5790c7242e	skip to tokenize punktuation as word in WordTokenizer remove unused variables in condenser related to Tokenizer	10 years ago
reger	f07392ff17	add. use host port parameter in YaCyApp	10 years ago
Michael Peter Christen	09d2867050	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	ad0da5f246	added new web page snapshot infrastructure which will lead to the ability to have web page previews in the search results. (This is a stub, no function available with this yet...)	10 years ago
reger	aa0faeabc5	adjust translation text of error msg on empty query (ru: needs correction)	10 years ago
reger	c475be2937	fix (enable) error msg on empty query	10 years ago
reger	ef5c5b4489	update to Jetty 9.2.4	10 years ago
reger	f709132961	remove obsolete alternate link fix api link	10 years ago
Michael Peter Christen	5f5c7d69d1	added image screenshot generator	10 years ago
Michael Peter Christen	3c71e1c872	show vocabularies in search result (in case of debugging)	10 years ago
Michael Peter Christen	1d45d9405a	security bugfix	10 years ago
Michael Peter Christen	ff728b4aa5	ignore url errors during search	10 years ago
Michael Peter Christen	c94c24638f	disabled postprocessing by default. If you read this: please disable postprocessing in your peer as well: open /IndexSchema_p.html, then deselect field process_sxt	10 years ago
Michael Peter Christen	2fce2e2697	larger boost fields for ranking	10 years ago
Michael Peter Christen	6c03ff8355	bold words in snippets should not be coloured black in the base style because there are styles with dark backgrounds which make the bold word invisible	10 years ago
Michael Peter Christen	8317914ce3	changed vocabulary navigator object type to TreeMap to get a specific order into the vocabularies. This is now lexicographic which is not so much random as a hashed order	10 years ago
Michael Peter Christen	d5c1b07768	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	10 years ago
Michael Peter Christen	c0f9f6ac66	added option to change the navbar-default, i.e. usable for dark skins	10 years ago
Michael Peter Christen	10794e8efd	trying facet.method fc instead of fcs to handle large facets	10 years ago
Michael Peter Christen	041b605cfe	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	10 years ago
Michael Peter Christen	f1f74e8626	toString fix	10 years ago
Michael Peter Christen	30276a2b48	prevent that a local Solr search and a local RWI search are running concurrently. When a RWI search result is flushed into the result set, id does Solr Queries (which replaced the old-style Metadata Queries) and they are possibly running concurrently to a previously startet Solr search. Both methods may block each other with IO. To enhance the speed, they are now serialized. Because the Solr search results may result in better results using the more advanced and configurable Ranking methods, this result is preverred over the RWI search result. However, remote RWI search results are still feeded concurrently into the search result as well.	10 years ago
Michael Peter Christen	84763126e0	added option to make the YaCy proxy act as the cache is never stale. If set to 'Always Fresh' the cache is always used if the entry in the cache exist. This is a good way to archive web content and access it without going online again in case the documents exist. To do so, open /Settings_p.html?page=ProxyAccess and check the "Always Fresh" checkbox. This is set do false which behave as set before. If you set this to true, then you have your web archive in DATA/HTCACHE. Copy this to carry around your private copy of the internet!	10 years ago
reger	1e7ee72240	fix path lookup to ./defaults/yacy.badwords (fix of commit `ee277b9b3e`)	10 years ago

... 3 4 5 6 7 ...

11622 Commits (53e4ae65d0bca0ff8fb6b2a766742de87d1691d6) All Branches Search

11622 Commits (53e4ae65d0bca0ff8fb6b2a766742de87d1691d6)

All Branches