yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	caa20d63d9	fixed seedlist (hash was missing)	11 years ago
Michael Peter Christen	ccf2f4e43b	refactoring of seed attributes (introduced more constants)	11 years ago
Michael Peter Christen	c927b428d3	fixed json	11 years ago
Michael Peter Christen	64048ff217	fir for XSS	11 years ago
orbiter	b7f1e5af51	added new servlet which generates the same file as the principal peers upload to a bootstrap position you can call it either with http://localhost:8090/yacy/seedlist.html or to generate json (or jsonp) with http://localhost:8090/yacy/seedlist.json http://localhost:8090/yacy/seedlist.json?callback=seedlist	11 years ago
orbiter	3e552550d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	c2d720cdaf	purge a lucene cache - possible memory leak fix	11 years ago
reger	f111f30ace	Merge origin/master into jetty	11 years ago
Michael Peter Christen	f4172cbb3d	fix for another XSS bug	11 years ago
orbiter	ff86cb683f	fixed some XSS bugs reported by Marius from http://ctf365.com/	11 years ago
orbiter	19a051bec8	more monitoring for postprocessing and enhanced layout in Crawler monitor page	11 years ago
Michael Peter Christen	fceac8cffd	more monitoring for postprocessing	11 years ago
Michael Peter Christen	9d5895f643	enhanced and fixed postprocessing	11 years ago
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	11 years ago
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	11 years ago
Michael Peter Christen	69b8d61c47	fix for search requests in GSA interface which contain 'funny' characters (like ':' etc.)	11 years ago
orbiter	4234b0ed6c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	74c86a72a0	better default value for crawler user agent	11 years ago
reger	1437c45383	merge rc1/master	11 years ago
Michael Peter Christen	87a956e881	calculating and showing the number of files and the average size of a file in the HTCACHE in ConfigHTCache_p.html	11 years ago
Michael Peter Christen	acc1f8a749	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	81bb50118e	found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes.	11 years ago
sixcooler	987f410011	URL-export:add query and fix for cast-class-exception	11 years ago
Michael Peter Christen	ffe8276063	replaced referrer link masking to 'pure' links to the referring page (that was more useful during testing)	11 years ago
reger	b38de92a16	Merge origin/master into jetty	11 years ago
Michael Peter Christen	434e13b46d	in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)	11 years ago
orbiter	1ac504ae51	use html encoding for urls in metadata	11 years ago
reger	f017066197	Merge origin/master into jetty	11 years ago
Michael Peter Christen	25951cee14	- fixed opensearchdescription, this delivered an url with missing 'global' option - added display=2 to compare_yacy to remove the superfluous border	11 years ago
Michael Peter Christen	f1bfe64361	integrated startpage to compare_yacy	11 years ago
Michael Peter Christen	2f57327f20	added boolean load property to CacheResource_p servlet which causes that the servlet loads the page from the web.	11 years ago
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	11 years ago
Michael Peter Christen	5afa6e3aee	Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB.	11 years ago
Michael Peter Christen	030d0776ff	Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250	11 years ago
Michael Peter Christen	4948c39e48	added concurrency for mass crawl check	11 years ago
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	11 years ago
Michael Peter Christen	16e3b357b3	replaced old tag cloud and adopted design a bit	11 years ago
Michael Peter Christen	dc38d35986	added matching in url field in Table_API_p search	11 years ago
Michael Peter Christen	691d7e70fa	added hint to development/commit rss feed	11 years ago
Michael Peter Christen	b81859c751	Show a RSS icon in the right top corner of search results. This replaces the 'API' icon which was the link for the opensearch result which is an extension of RSS. Since it is more appropriate to visualize a RSS link with an RSS icon, this API icon was changed here.	11 years ago
Michael Peter Christen	1a09771be8	fixed sitemap crawl start	11 years ago
orbiter	b743e6d79f	- prevent that crawl filter have empty (never-match) content - rewrite the description of the options "Restrict to start domain(s)" and "Restrict to sub-path(s)" to an explanation, that the restriction applies to all links in the link list of the option "From Link-List of URL" if this option is selected - allow "Restrict to sub-path(s)" if the "From Link-List of URL" is selected. This is supported in the crawl start.	11 years ago
orbiter	f597fdb602	make it easier to filter properties (case insensitive)	11 years ago
reger	f46c723398	allow to choose used http server, YaCy-Anomic or Jetty - defaults to Jetty (in this branch) - add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking	11 years ago
reger	1adb4b8741	merge rc1/master	11 years ago
reger	37d24f3318	make use of declared static string ACTION_LOCATION	11 years ago
reger	eea504c117	update Info.plist small DefaultServlet refactoring	11 years ago
reger	a44eede8b8	merge rc1/master	11 years ago
reger	54a0272338	searchpage javascript (latestinfo) causes reset of search statistic after moving to next page - disabled call via setTimeout in yacysearch.html	11 years ago
Michael Peter Christen	91fa99e9bb	added new icon/image for latest commit	11 years ago
Michael Peter Christen	9fac9249bc	- replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible.	11 years ago
Michael Peter Christen	0f6db6ad5b	Merge remote-tracking branch 'jensbees/crawlexpert-post'	11 years ago
Jens Bertram	3252c1ec39	Merge upstream/master into crawlexpert-post	11 years ago
Michael Peter Christen	90c8577840	enhanced ranking; patches to replace old ranking	11 years ago
bhoerdzn	a3824dfbaa	check URL on inital load, if set	11 years ago
bhoerdzn	52f49d475b	add a hidden field for "crawlingstart" since jQuery omits the submit button value	11 years ago
bhoerdzn	b0c0ec2dec	link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler"	11 years ago
bhoerdzn	d64d45361c	use integer types for boolean values	11 years ago
bhoerdzn	eda123d6fd	remove debugging code intercepting post requests	11 years ago
bhoerdzn	5057f27bbd	fix typo in parsing "cachePolicy" parameter	11 years ago
bhoerdzn	98f5c9018d	Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load.	11 years ago
bhoerdzn	a6a62986d4	correct state handling for country code restriction	11 years ago
bhoerdzn	4066b85155	correctly set initial state for load filters	11 years ago
bhoerdzn	8c91c3e7cd	set form boolean values to 0 & 1 instead of false & true	11 years ago
bhoerdzn	c27fabc88e	fixed wrong parameter check	11 years ago
bhoerdzn	2214bf5396	Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.	11 years ago
reger	71d2655c02	downgrade to Jetty 8 to assure support of JRE 1.6 - introduce a YaCyHttp interface to modulize/separate http server - adjust the Jetty version specific implementation part (in package net.yacy.http) - putting the version specific code in classes starting with Jetty8xxxx - moved existing Jetty9xxx implementation into a test class (to keep the code) - adjust build to the changed jars - make use of the introduced YaCyHttpServer interface in related htroot servlets - adjust other test cases/classes	11 years ago
orbiter	705b3338ee	list more fields available for search and for ranking boosts	11 years ago
bhoerdzn	405878182f	Use list template for all other option lists. Fixed some template expressions.	11 years ago
bhoerdzn	8e74098cd4	Use list template for "reloadIfOlderNumber".	11 years ago
bhoerdzn	52bad7b908	Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields.	11 years ago
Michael Peter Christen	e56aa4fe93	fixed search navigation	11 years ago
Michael Peter Christen	4fbc4740df	removed warnings	11 years ago
bhoerdzn	45cf553bc3	try to guess default crawling mode, if none set	11 years ago
bhoerdzn	b4f0c822f2	assign strings before checking contents	11 years ago
bhoerdzn	499abe8f91	set default values for string parameters	11 years ago
bhoerdzn	42ea56eaad	made crawStartExpert_p aware of post variables; extended template where needed	11 years ago
reger	c7c706fd9f	merge with rc1/master	11 years ago
Michael Peter Christen	82bfd9e00a	- crawl profiles shall be deleted from active and passive stacks if they are deleted to terminate the crawl because otherwise the crawl will go on after the load-from-passive stack policy. - better check if a crawl is terminated using the loader queue.	11 years ago
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	11 years ago
orbiter	d86d2be5c3	automatically removed Places autotagging if no location library is wanted	11 years ago
reger	5c4ba9b5db	merge rc1 master	11 years ago
reger	70c51775ae	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
orbiter	d2effd21db	fix for npe during location search	11 years ago
Michael Peter Christen	e40671ddb7	better and consistent deletions for error urls	11 years ago
Michael Peter Christen	2602be8d1e	- removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory	11 years ago
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
reger	13fc86c960	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	127adbf5cf	remove references to 10_http thread (legacy http server) and add needed get/set function to jetty http server wrapper	11 years ago
Michael Peter Christen	3e22d05290	added option for daterange properties in GSA interface to use an left- or right-open date range; i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional to daterange=2013-09-02..2013-09-09	11 years ago
reger	36b7159282	- remove double initialization of jetty - refactor some var assignments	11 years ago
reger	63ed04260a	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
Michael Peter Christen	35ab2cef7b	added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in html meta fields to get a correct (or: better) date timestamp. The http:last-modified mostly does not work because it is set to the current date from most CMS.	11 years ago
reger	aafef72a8a	merged current rc1/master into jetty branch to allow further development with latest version ServerSideIncludes and servlet return values need further work (for working jetty integration) - TODO: added nasty quickfix to allow SSI - needs further work - TODO: YaCy servlet return values/parameters are not handled	11 years ago
Michael Peter Christen	dbef8ccfcb	forced deletion of ZURL entries for a specific host for each host that appears in the crawl url list	11 years ago
Michael Peter Christen	e137ff4171	refactoring (im preparation for new removeHost method)	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	5d71a4c8bc	fix for dc:description field	11 years ago

1 2 3 4 5 ...

4598 Commits (cabe0943cd813066a1df92f7d1581724b117a357)