yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	504327b15c	fix for condition for writing the webgraph	11 years ago
Michael Peter Christen	542c20a597	changed handling of crawl profile field crawlingIfOlder: this should be filled with the date, when the url is recognized as to be outdated. That field was partly misinterpreted and the time interval was filled in. In case that all the urls which are in the index shall be treated as outdated, the field is filled now with Long.MAX_VALUE because then all crawl dates are before that date and therefore outdated.	11 years ago
Michael Peter Christen	4eec1a7452	refactoring (change Metadata name of load time data structure to avoid confusion with Node data which is also called metadata)	11 years ago
reger	c95ba52cf0	improve logexception info - log a message or class name insted of msgtxt "null"	11 years ago
reger	7f0e757bb5	fix bookmark.rss - channel end tag postion - link with html entity	11 years ago
orbiter	e441831a24	reverted toString() change in AnchorURL to prevent mistakenly used toString(). This fixes also the update link bug.	11 years ago
reger	697b9743e7	Add link to RemoteCrawl_p suggestion http://mantis.tokeek.de/view.php?id=277	11 years ago
reger	47f201a6b8	Add Solr default query fields (&qf) to select servlet according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query). This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration. Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields and does not relay on the duplication of text to text_t. - add author to reset-default boost fields (support results for author nav)	11 years ago
reger	f96cfdc84d	prevent array out of bound exception on getRankingProfile(x) on faulty &profileNr= query parameter	11 years ago
Michael Peter Christen	970368359b	Merge branch 'master' of ssh://gitorious.org/yacy/rc1	11 years ago
Michael Peter Christen	c4608469bf	Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1	11 years ago
reger	8004cfc961	fix input boostfield factor of 0.0 in RankingSolr - input was accepted and stored but not editeable (added check factor >0.0 during edit) - make use of some more predefined solr constants	11 years ago
reger	5f5fb4ecdc	remove unused static (RSS)search from protocol	11 years ago
reger	7c1706d83a	use CRLF in generated bat command scripts for windows - for easier viewing with standard viewers	11 years ago
reger	a2cb366b25	Combine /heuristic search modifier with opensearch configured targets - with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid) - this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches - the index.html searchoption text adjusted to be displayed only if option configured - add Archive-It to predefined systems	11 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	11 years ago
Michael Peter Christen	bf1b6b93e7	do not write CR values to webgraph if no CR values are computed	11 years ago
Michael Peter Christen	e039e78210	small bugfixes	11 years ago
Michael Peter Christen	87f8118108	added option to delete documents from the webgraph	11 years ago
Michael Peter Christen	32a2ff925c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	d07cdd8c3b	added SolrCloud access mode and configuration	11 years ago
Michael Peter Christen	8514bffc22	enhanced postprocessing status report	11 years ago
malykhin.dmitry	53ecd54b45	Update russian translation	11 years ago
reger	f99f3d5cf2	fix button (clear list) text color in CrawlResults	11 years ago
reger	b24572f304	fix GSA filter query assignment - use more parameter constants	11 years ago
Michael Peter Christen	b5fc2b63ea	removed exist() retrieval functions from error cache and replaced it with metadata retrieval from connectors directly. This should cause better usage of the cache. Automatically increase the metadata cache if more memory is available.	11 years ago
Michael Peter Christen	62c72360ee	cleanup of checkAcceptanceInitially in CrawlStacker, should avoid double-calling of solr	11 years ago
Michael Peter Christen	dd5cdfe212	reverted filter query hack, it did not work	11 years ago
Michael Peter Christen	b5d78ba156	reduced number of solr queries during crawling	11 years ago
Michael Peter Christen	5326970d6c	enhanced solr queries for single document extraction	11 years ago
Michael Peter Christen	525575bd97	added debugging of filter queries in thread dump thread names	11 years ago
Michael Peter Christen	f319ef268f	testing filter queries instead of queries to retrieve documents by id	11 years ago
Michael Peter Christen	fd87fa1613	removed more unnecessary exist-checks in ErrorCache	11 years ago
Michael Peter Christen	f2b476e08b	don't do a double check to solr for failed documents if they are not written to solr	11 years ago
Michael Peter Christen	06ab72d1af	enhanced crawler host round-robin strategy	11 years ago
orbiter	dab9a0786a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	51bf5c85b0	Renamed the transmission cloud to buffer in dispatcher since the name 'cloud' was a bad idea. Changed also the accumulation process for peer targets so that every dht chunk is not assigned the set of redundant targets but they are assigned to redundant targets individually. This enhances the granularity of the target accumulation and should enhance the efficiency of the process. Finally the dht protocol client was enriched with the ability to remove the 'accept remote index' flag from peers or remove peers completely if they do not answer at all.	11 years ago
reger	7057e0b3e2	catch input file not found in Mediawiki import	11 years ago
Michael Peter Christen	a694b6a8fc	another fix for unique field computation	11 years ago
Michael Peter Christen	fb3dd56b02	fix for processing of noindex flag in http header	11 years ago
Michael Peter Christen	b0d941626f	fixed bugs in canonical, robots and title/description unique calculation	11 years ago
reger	d9472d043a	cleanup older unused classes	11 years ago
reger	665e12f88e	move startup time from old serverCore to switchboard (most used here) to make servercore eventually obsolete.	11 years ago
reger	336425912a	remove unused localSearchThread from SearchEvent	11 years ago
reger	32bd2a61c1	add local ip to AbstractRemoteHandler local hostname cache	11 years ago
Michael Peter Christen	f3a6b6e21e	fix for bad URL decoding	11 years ago
Michael Peter Christen	1092e798a5	fixed double content postprocessing	11 years ago
Michael Peter Christen	aee5b108e5	added linkScraperParser, a parser which ignores the text like the generic parser but extracts links like the htmlParser. This should be used for ASCII documents without known text format annotation like source code files or json documents. Probably also good for xml files without known schema.	11 years ago
Michael Peter Christen	f384fd624b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
reger	2b8cc5832c	fix seek error for 0 file size records file by add extra check for file size = 0 in cleanlast() - (http://mantis.tokeek.de/view.php?id=411)	11 years ago

1 2 3 4 5 ...

11000 Commits (504327b15c142a88b12016fb8ee75144b822a1f3) All Branches Search

11000 Commits (504327b15c142a88b12016fb8ee75144b822a1f3)

All Branches