yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	b29d262e70	implement Jetty8HttpServerImpl.generateSocketAddress (code 1:1 copied from serverCore)	11 years ago
reger	066a1ecf0a	add highlight queryparams to solrservlet if missing - modify query params in Solr parameter map (instead of querystring)	11 years ago
reger	4684330505	Merge origin/master into jetty Conflicts: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java	11 years ago
reger	1437c45383	merge rc1/master	11 years ago
Michael Peter Christen	87a956e881	calculating and showing the number of files and the average size of a file in the HTCACHE in ConfigHTCache_p.html	11 years ago
Michael Peter Christen	acc1f8a749	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	81d9e23532	fixed another memory leak in the PDF parser: the class org.apache.pdfbox.pdmodel.font.PDFont occupies 8MB of space which cannot be cleaned if PDFont.clearResources is called. The attempt to clean the class cache therefore causes that the class is loaded and this cache is initialized with some rubbish. I tried to prevent to instantiate this class by usage of a hacked findLoadedClass call to the SystemClassLoader (which is protected ...). Now, without using the PDF parser at all, 8MB of RAM space is not occupied, however, when the first PDF arrives this space will be taked and never given back to GC. WAKE UP YOU LAZY PDFBOX HACKER AND FIX THIS SHIT!	11 years ago
Michael Peter Christen	c152d996e6	reduced footprint of BookmarksDB which can take quite a lot of memory if the number of bookmarks is high (i.e. > 2000 URLs)	11 years ago
Michael Peter Christen	81bb50118e	found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes.	11 years ago
reger	7b17cdf6dd	add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type adresses also possible url without or deviating extension.	11 years ago
reger	082c9a98c1	move writeHeaders from Jetty8 servlet to YaCyDefaultServlet - after removing Jetty server dependency (of Response using HttpServletResponse only)	11 years ago
sixcooler	987f410011	URL-export:add query and fix for cast-class-exception	11 years ago
Michael Peter Christen	a8253ca49c	added missing unicode transformation in href link contents during parsing	11 years ago
Michael Peter Christen	0cf9e9580b	added clickdepth and CR computation debug code to verify that the process is complete	11 years ago
reger	b85f702f22	add AccessTracker logging to SolrServlet	11 years ago
reger	de1f02420b	implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet. - set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html" - set a contenttype to GSAsearchServlet	11 years ago
Michael Peter Christen	234a974955	load image only if their parser flag is activated	11 years ago
Michael Peter Christen	b2c329929f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	60187a4ec2	fix in html parser	11 years ago
Michael Peter Christen	e1c1e57877	less overhead calling exist() with only one hash	11 years ago
reger	3d5d366f1c	fix html header in Solr HTMLResponseWriter - move 1st body content after </head> tag - add closing <span> tag	11 years ago
reger	bfdb404867	implement a Jetty reconnect to work with Configbasic_p.html port change - instead of shutting down the server it should be sufficient to manipulate the Jetty http connector	11 years ago
Michael Peter Christen	5a02d650ee	avoid cloning	11 years ago
reger	d6760df3e5	fix servlet class exist check to use default path only (in Jetty8YaCyDefaultServlet) - del redundant doget code in yacydefaultservlet - small declaration code opts - del obsolete libt/proxyservlet.java	11 years ago
reger	b38de92a16	Merge origin/master into jetty	11 years ago
Michael Peter Christen	cc39667399	Speed enhancements and less CPU usage during Solr searches when using the embedded Solr (the default). This was obtained by cirumventing solrj search encapsulation and the implementation of direct index access methods to Solr. The effect will not only be seen during search, but this has also a strong effect on suggestions (much more) and less CPU power usage during index distribution (which needs many search requests)	11 years ago
Michael Peter Christen	434e13b46d	in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)	11 years ago
reger	6944225037	- add GSA search /gsa/search servlet for Jetty to Server init - include SecurityHandler check for /gsa/ /solr/ - change one more YaCyDefaultServlet dependency from Jetty to std. javax.Servlet	11 years ago
reger	53cb30a221	reduce logging (by assigning logger to existing logger) - small additional cleanups	11 years ago
reger	332c6d4fe1	reactivate Domain handler for .yacy / .yacyh handling	11 years ago
reger	b1ce70434e	resolve merge conflict - add missing import statement	11 years ago
reger	7869a4c070	Merge origin/master into jetty - merge conflict resolve	11 years ago
reger	f017066197	Merge origin/master into jetty	11 years ago
reger	06da6f517c	add YaCyProxyServlet to handle /proxy.html?url=proxyurl - based on Jetty ProxyServlet - at this time use existing HTTPD ProxyHandler for url rewrite - add jetty-client jar (dependency in Jetty ProxyServlet) reuse ProxyHandler.convertHeaderFromJetty in YaCyDefaultServlet	11 years ago
reger	69599566f9	catch one more malformed url in proxy url rewrite	11 years ago
reger	605530fec5	catch proxy url rewrite exception malformed url (" http:\/\/" ) may cause error response testcase http://localhost:8090/proxy.html?url=http://dictionary.reference.com/browse/test	11 years ago
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	11 years ago
orbiter	3c3cb78555	- removed a lot of garbage and bloated code from GuiHandler. - transformed log lines to String before they are stored because the storage space is about 1:250 (45kb for one line before transformation, 180 bytes afterwards) - this saves up to 10MB RAM so we can increase the number of lines to 1000 again.	11 years ago
Michael Peter Christen	5afa6e3aee	Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB.	11 years ago
Michael Peter Christen	030d0776ff	Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250	11 years ago
Michael Peter Christen	6aabc4e5c8	reduced logging line memory, 10000 lines had filled up 450MB! grrr. (thank you, a bomb from the past)	11 years ago
Michael Peter Christen	1a8783147b	enhanced computation of number of solr documents.	11 years ago
Michael Peter Christen	4948c39e48	added concurrency for mass crawl check	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
Michael Peter Christen	82621bead0	When doing bootstraping, always accept one seedlist-File without checking the date of the file. This should help to start the peer in case that the user has a completely wrong date setting.	11 years ago
Michael Peter Christen	691d7e70fa	added hint to development/commit rss feed	11 years ago
orbiter	20bbde8665	fix for mustmatch regex computation: result had correct semantic, but may have contained multiple same expressions within the disjunction of domain-restrictions. This fix removes the redundant restrictions and makes the regex shorter.	11 years ago
reger	cb2dbcb843	add graceful Jetty shutdown option - as Jetty stop is not synced, yet - include jetty jars and servlet-3.0 api jar in Eclipse .classpath	11 years ago
reger	f46c723398	allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking	11 years ago
reger	da4ff5aefa	add YaCy HttpCommand "authenticate" check to DefaultServlet	11 years ago

1 2 3 4 5 ...

6657 Commits (1a6158e33893bba561e8a373dc3ebab374f16fa0)