yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	52dd491c04	fix not necessary use of DigestURL	11 years ago
reger	5111841e5b	- reduce Jetty debug logging - fix Context path initialization	11 years ago
reger	bc6ebb3c06	adjust to DigestURI changes from master to DigestURL	11 years ago
reger	561cbc7ee2	use more YaCy HeaderFramework constants (instead of Jetty's)	11 years ago
reger	5c4ba9b5db	merge rc1 master	11 years ago
reger	70c51775ae	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	4b77733e59	implement a YaCyDefaultServlet to handle YaCy-servlets within Jetty server - the implementation is inspired by Jetty's DefaultServlet - handles static html content and YaCy servlets - translates between standard servlet request/response and YaCy request/response specification With the implementation of YaCy-servlets as servlet instead via a jetty handler it's closer to servlet standard and carries less jetty specific dependencies.	11 years ago
orbiter	828603e4f1	fix for 100%CPU problem in error cache cleaning process	11 years ago
orbiter	c64b51134e	hack to add all tokens from the url to text_t. This was working for the RWI index (and still is working) but not for solr-only search indexes. Maybe we should find a solution using a separate search field instead.	11 years ago
orbiter	6e8377b8ad	do not check all words with synonym library if the library is empty	11 years ago
orbiter	70ba74b23a	disabled ipv4 preference to enable ipv6-only networks like freifunk	11 years ago
orbiter	f3be1930cb	CPU problem when pusing to the error cache; wrong class, ConcurrentHashMap needed for concurrency	11 years ago
Michael Peter Christen	e40671ddb7	better and consistent deletions for error urls	11 years ago
Michael Peter Christen	2602be8d1e	- removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory	11 years ago
Michael Peter Christen	31920385f7	set anchor rel attribute of all links to "nofollow" if the html meta contains a robots:nofollow or if the http header contains a "X-Robots-Tag: nofollow"	11 years ago
reger	9619b8743c	add Solr Servlet	11 years ago
Michael Peter Christen	57e00baf26	fix for parsing of image links inside of anchor links (image-links)	11 years ago
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	11 years ago
Michael Peter Christen	3ea9bb4427	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	11 years ago
reger	13fc86c960	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	850609937f	update Info.plist for Jetty 9 jars	11 years ago
reger	f7f86d8a5d	update to Jetty 9 jars - include javax.servlet 3.0	11 years ago
reger	603368fc3e	remove redundant declaration of USER_AGENT	11 years ago
reger	bd71b14d25	add mandatory p2p parameter to templatePattern	11 years ago
reger	b8da176c5d	adjust setHandled to request of call parameter	11 years ago
reger	127adbf5cf	remove references to 10_http thread (legacy http server) and add needed get/set function to jetty http server wrapper	11 years ago
Michael Peter Christen	1a8c64117f	decreased the responseHeaderDB database which is now flushed more frequently. This will preserve more documents in the cache in case of a crash.	11 years ago
Michael Peter Christen	3e22d05290	added option for daterange properties in GSA interface to use an left- or right-open date range; i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional to daterange=2013-09-02..2013-09-09	11 years ago
reger	36b7159282	- remove double initialization of jetty - refactor some var assignments	11 years ago
reger	8e52271491	- delete not needed old jetty jars from libt - add jetty to Info.plist	11 years ago
reger	63ed04260a	Merge remote-tracking branch 'origin/master' into jetty	11 years ago
reger	fe87fb638a	adjust test/ParserTest to dc_description data type	11 years ago
Michael Peter Christen	35ab2cef7b	added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in html meta fields to get a correct (or: better) date timestamp. The http:last-modified mostly does not work because it is set to the current date from most CMS.	11 years ago
reger	2ee68f76f6	added read parameter from multi-part form fields (to nasty quick-fix)	11 years ago
Michael Peter Christen	9cc8468b30	added tools to visualize image generation (i.e. during testing)	11 years ago
reger	105cf8f593	changes to adjust jetty to recent code changes	11 years ago
reger	aafef72a8a	merged current rc1/master into jetty branch to allow further development with latest version ServerSideIncludes and servlet return values need further work (for working jetty integration) - TODO: added nasty quickfix to allow SSI - needs further work - TODO: YaCy servlet return values/parameters are not handled	11 years ago
Michael Peter Christen	dbef8ccfcb	forced deletion of ZURL entries for a specific host for each host that appears in the crawl url list	11 years ago
Michael Peter Christen	e137ff4171	refactoring (im preparation for new removeHost method)	11 years ago
Michael Peter Christen	7a5574cd51	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	85456f46b2	added two new fields, exact_signature_copycount_i and fuzzy_signature_copycount_i, which count the number of copies of non-unique documents and assigns this to each document. Thus, each document there is a number assigned which shows how many copies of this document exists. These fields are disabled by default.	11 years ago
orbiter	26366596d9	fix for a problem which ocurres when a site is crawled where the start url is redirected.	11 years ago
Michael Peter Christen	a2511b5600	turned images_alt_txt back to images_alt_sxt because it is not necessary to index the alt text. Indexed image Text is in images_text_t	11 years ago
Michael Peter Christen	85b1922244	activated image type navigation for image search	11 years ago
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	ab1201fdfd	fixed wrong facet count	11 years ago
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	11 years ago
Michael Peter Christen	69f85265e1	added an option to put image links to the crawl queue and handle these like normal documents. Using this option (by default on at this moment; this might change soon) it is possible to get the exif data into the search index to be used in image search.	11 years ago
Michael Peter Christen	e8e558a9b7	fix for content domain classification in URIMetadataNode	11 years ago

1 2 3 4 5 ...

9889 Commits (52dd491c046c5418af198a7815d051c0b8e34fa1) All Branches Search

9889 Commits (52dd491c046c5418af198a7815d051c0b8e34fa1)

All Branches