yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	c152d996e6	reduced footprint of BookmarksDB which can take quite a lot of memory if the number of bookmarks is high (i.e. > 2000 URLs)	11 years ago
Michael Peter Christen	81bb50118e	found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes.	11 years ago
reger	7b17cdf6dd	add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type adresses also possible url without or deviating extension.	11 years ago
reger	082c9a98c1	move writeHeaders from Jetty8 servlet to YaCyDefaultServlet - after removing Jetty server dependency (of Response using HttpServletResponse only)	11 years ago
sixcooler	987f410011	URL-export:add query and fix for cast-class-exception	11 years ago
Michael Peter Christen	ffe8276063	replaced referrer link masking to 'pure' links to the referring page (that was more useful during testing)	11 years ago
Michael Peter Christen	a8253ca49c	added missing unicode transformation in href link contents during parsing	11 years ago
Michael Peter Christen	0cf9e9580b	added clickdepth and CR computation debug code to verify that the process is complete	11 years ago
Michael Peter Christen	7f768b42d3	we do not need the load-image flag any more since this is now controlled by parser switches	11 years ago
reger	b85f702f22	add AccessTracker logging to SolrServlet	11 years ago
reger	de1f02420b	implement HtmlResponseWriter to solrServlet (and rss / opensearch responswriter) as in yacy select servlet. - set contenttype of HTLM/GrepHTML-Reponsewriter to "text/html" - set a contenttype to GSAsearchServlet	11 years ago
Michael Peter Christen	234a974955	load image only if their parser flag is activated	11 years ago
Michael Peter Christen	b2c329929f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	60187a4ec2	fix in html parser	11 years ago
Michael Peter Christen	e1c1e57877	less overhead calling exist() with only one hash	11 years ago
reger	3d5d366f1c	fix html header in Solr HTMLResponseWriter - move 1st body content after </head> tag - add closing <span> tag	11 years ago
reger	bfdb404867	implement a Jetty reconnect to work with Configbasic_p.html port change - instead of shutting down the server it should be sufficient to manipulate the Jetty http connector	11 years ago
Michael Peter Christen	5a02d650ee	avoid cloning	11 years ago
reger	8ec350bad2	upd Maven pom (take back introduced java-templates) to handle filtering of yacyBuildProperties.java. To keep it compatible with ant filter directly from original sourcd/.... location.	11 years ago
reger	d6760df3e5	fix servlet class exist check to use default path only (in Jetty8YaCyDefaultServlet) - del redundant doget code in yacydefaultservlet - small declaration code opts - del obsolete libt/proxyservlet.java	11 years ago
reger	9da87c0c7f	update Maven build script - use current YaCy version number - make use of libbuild\GitRevMavenTask (maven-plugin-gitrevisionnumber) - make yacyBuildProperties.java available for source filtering by Maven-plugin (copy to libbuild\java-templates) - update assembly definition to include lib\yacycore.jar without version number (needed this way by startupscript)	11 years ago
reger	62c591ffd1	add Maven plugin to return a YaCy style Git repository build release number and timestamp - it injects properties which can be used in pom via ${DSTAMP} ${releaseNr} if added as plugin via <plugin> <groupId>net.yacy</groupId> <artifactId>maven-plugin-gitrevisionnumber</artifactId> <version>1.0</version> <executions><execution> <phase>initialize</phase> <goals><goal>create</goal></goals> </execution></executions> </plugin>	11 years ago
reger	b38de92a16	Merge origin/master into jetty	11 years ago
reger	a09e70cd68	fix typo in GitRevTask (branch)	11 years ago
Michael Peter Christen	cc39667399	Speed enhancements and less CPU usage during Solr searches when using the embedded Solr (the default). This was obtained by cirumventing solrj search encapsulation and the implementation of direct index access methods to Solr. The effect will not only be seen during search, but this has also a strong effect on suggestions (much more) and less CPU power usage during index distribution (which needs many search requests)	11 years ago
Michael Peter Christen	434e13b46d	in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)	11 years ago
orbiter	176acce5cb	version number change for next development cycle	11 years ago
orbiter	1ac504ae51	use html encoding for urls in metadata	11 years ago
reger	6944225037	- add GSA search /gsa/search servlet for Jetty to Server init - include SecurityHandler check for /gsa/ /solr/ - change one more YaCyDefaultServlet dependency from Jetty to std. javax.Servlet	11 years ago
reger	ec3c0582ae	update Maven pom and jar dependencies	11 years ago
reger	53cb30a221	reduce logging (by assigning logger to existing logger) - small additional cleanups	11 years ago
reger	332c6d4fe1	reactivate Domain handler for .yacy / .yacyh handling	11 years ago
reger	b1ce70434e	resolve merge conflict - add missing import statement	11 years ago
reger	7869a4c070	Merge origin/master into jetty - merge conflict resolve	11 years ago
reger	f017066197	Merge origin/master into jetty	11 years ago
reger	06da6f517c	add YaCyProxyServlet to handle /proxy.html?url=proxyurl - based on Jetty ProxyServlet - at this time use existing HTTPD ProxyHandler for url rewrite - add jetty-client jar (dependency in Jetty ProxyServlet) reuse ProxyHandler.convertHeaderFromJetty in YaCyDefaultServlet	11 years ago
reger	69599566f9	catch one more malformed url in proxy url rewrite	11 years ago
reger	605530fec5	catch proxy url rewrite exception malformed url (" http:\/\/" ) may cause error response testcase http://localhost:8090/proxy.html?url=http://dictionary.reference.com/browse/test	11 years ago
orbiter	aaa945518d	next intermediate release 1.64	11 years ago
Michael Peter Christen	25951cee14	- fixed opensearchdescription, this delivered an url with missing 'global' option - added display=2 to compare_yacy to remove the superfluous border	11 years ago
Michael Peter Christen	f1bfe64361	integrated startpage to compare_yacy	11 years ago
Michael Peter Christen	2f57327f20	added boolean load property to CacheResource_p servlet which causes that the servlet loads the page from the web.	11 years ago
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	11 years ago
orbiter	3c3cb78555	- removed a lot of garbage and bloated code from GuiHandler. - transformed log lines to String before they are stored because the storage space is about 1:250 (45kb for one line before transformation, 180 bytes afterwards) - this saves up to 10MB RAM so we can increase the number of lines to 1000 again.	11 years ago
Michael Peter Christen	5afa6e3aee	Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB.	11 years ago
Michael Peter Christen	030d0776ff	Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250	11 years ago
Michael Peter Christen	6aabc4e5c8	reduced logging line memory, 10000 lines had filled up 450MB! grrr. (thank you, a bomb from the past)	11 years ago
Michael Peter Christen	1a8783147b	enhanced computation of number of solr documents.	11 years ago
Michael Peter Christen	4948c39e48	added concurrency for mass crawl check	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago

... 3 4 5 6 7 ...

10254 Commits (41dc0f82c167dd25ba4e90ae7c3acbde3992530e) All Branches Search

10254 Commits (41dc0f82c167dd25ba4e90ae7c3acbde3992530e)

All Branches