yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	6491270b3a	large IPv6 redesign of peer ping methods! removed preferred IPv4 in start options and added a new field IP6 in peer seeds which will contain one or more IPv6 addresses. Now every peer has one or more IP addresses assigned, even several IPv6 addresses are possible. The peer-ping process must check all given and possible IP addresses for a backping and return the one IP which was successful when pinging the peer. The ping-ing peer must be able to recognize which of the given IPs are available for outside access of the peer and store this accordingly. If only one IPv6 address is available and no IPv4, then the IPv6 is stored in the old IP field of the seed DNA. Many methods in Seed.java are now marked as @deprecated because they had been used for a single IP only. There is still a large construction site left in YaCy now where all these deprecated methods must be replaced with new method calls. The 'extra'-IPs, used by cluster assignment had been removed since that can be replaced with IPv6 usage in p2p clusters. All clusters must now use IPv6 if they want an intranet-routing.	10 years ago
orbiter	a922b122a3	added a hack to forward solr search results from an external attached solr to the YaCy built-in solr search servlet. Its not complete and not fully correct (there is still a utf8 encoding problem) but it is a way to get easily requests forwarded through YaCy to an external Solr.	10 years ago
Michael Peter Christen	437ce3b8a0	added internal api for partial updates to Solr	11 years ago
Michael Peter Christen	6d3d4c4ea6	changed the concurrent enumeration of query results in such a way that it is now possible to get the results in two steps: - first retrieve all IDs as given for a query - then retieve each document individually This was necessary for very large result sets where a query may run for hours and is possibly terminated by a solr-internal timeout. This occurs regulary during postprocessing and therefore this commit may fix unwanted postprocessing terminations.	11 years ago
Michael Peter Christen	ad35d9294f	added a 'stats' table which records some peer statistics twice every hour. The table can be shown with http://localhost:8090/Tables_p.html?table=stats The entries have the following meaning: aM: activeLastMonth aW: activeLastWeek aD: activeLastDay aH: activeLastHour cC: countConnected (Active Senior) cD: countDisconnected (Passive Senior) cP: countPotential (Junior) cR: count of the RWI entries cI: size of the index (number of documents) The entry keys are abbreviated to reduce the space in the table as the name is written again for every row. This is the beginning of a 'yacystats' micro-alternative als built-in function in YaCy. Graphics may follow after some time if enough test data is available.	11 years ago
reger	8284ea751a	catch TimeoutException during ping and do not delete yacy.conf during prereadconfigfile found a situation after crash (reboot) with existing running semaphore but YaCy not running. Ping generated exception which finally deleted the conf file (during pre-read procedure) - change to ping (catch exception solved it) - additionally removed delete yacy.conf file (if needed we need to make a backup)	11 years ago
Michael Peter Christen	f1032fb8fe	more enhancements to image search in case that a restriction to a single domain is done	11 years ago
Michael Peter Christen	6a8fb8190b	changed default value for maximum number of connections to 50	11 years ago
Michael Peter Christen	39615de3f9	adding the buffer size is not wrong but may cause confusing information when the buffer is cleaned after a buffer flush which is not then available in Solr since that is waiting for a commit. In such cases the counter would run backwards which is prevented by ignoring the buffer size.	11 years ago
Michael Peter Christen	395edec6f1	changed strategy to count the number of documents: get the max of solr+buffer and the hit cache. This shall help during first crawls to see a running document counter even if there was no commit meanwhile to solr. To support that strategy, the hit cache must be written earlier.	11 years ago
Michael Peter Christen	cfb20bc0ce	removing the [] for ipv6 addresses may be a bad idea..	11 years ago
reger	5a4995ded3	fill solr rss writer dc:subject tag with keyword content	11 years ago
orbiter	c9e593cf78	removed warnings	11 years ago
reger	7584352e7b	use more predefined Solr query parameter constants - use CommonParams and DisMaxParams constants - fix typo in get sort parameter - getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface	11 years ago
reger	e9eae45b55	simplify rssreader and improve atom feed link extraction - type detection (rss/atom) - init type parameter overwritten during parse, parameter obsolete - detection by endtag changed to simpler first-tag evaluation - channel image not used, removed related extra parser handling - remove unused code (set/getImage) in rssfeed - atom link extraction to account for possible multipe link tags - spec limits link to one with rel="alternate" or one without rel attribute not accounting for the follwing type & hreflang exception yet: o atom:entry elements MUST NOT contain more than one atom:link element with a rel attribute value of "alternate" that has the same combination of type and hreflang attribute values.	11 years ago
reger	a8508417d1	catch NPE during crawl (OAI import) - condenseDocument mime=null (allowed) - collectionconfiguration responseheader = null (allowed)	11 years ago
Michael Peter Christen	3860711aef	fix for possible interruption of concurrent queries	11 years ago
Michael Peter Christen	6344718f8b	reducing the concurrent query stack size and reduced concurrency of postprocessing to avoid OOM situations	11 years ago
Michael Peter Christen	a0c53174c5	better solr query logging to detect unnecessary sort requests for more performance profiling	11 years ago
Michael Peter Christen	b983e68254	more retries, less sleep	11 years ago
Michael Peter Christen	0ceeceb35e	more logic on Solr queries; usage of the query terms in posprocessing, saving one query for double document detection now per document	11 years ago
orbiter	38864ae004	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	11 years ago
orbiter	4099296b45	added new classes which shall reduce call overhead to Solr (stub)	11 years ago
reger	d0c02e1de7	adjust rss lat/lon to double (common format across other classes)	11 years ago
Michael Peter Christen	05d58e4df0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	98f45c9032	fix for image alt attachment to AnchorURLs in html parser.	11 years ago
orbiter	22ce4fb4dd	better error handling for remote solr queries and exists-checks	11 years ago
reger	6c10b59f3e	move bootstrap peers test systems to its test class var assignment not needed elsewhere.	11 years ago
orbiter	1027f3d04a	fix for the usage of ready-prepared solr queries, some queries are formulated as edismax query but this was not set as query attribut. The defType=edismax property needs a qf-field, so this was added as well. Do not remove that field again! This fixes also a problem with title-unique computation.	11 years ago
orbiter	4b06adb751	fix for file urls	11 years ago
reger	e5854a5cdb	fix localhost link to opensearchdescription.xml	11 years ago
Michael Peter Christen	b44626e55b	fixed target_alt_t in webgraph	11 years ago
Michael Peter Christen	4eec1a7452	refactoring (change Metadata name of load time data structure to avoid confusion with Node data which is also called metadata)	11 years ago
reger	c95ba52cf0	improve logexception info - log a message or class name insted of msgtxt "null"	11 years ago
orbiter	e441831a24	reverted toString() change in AnchorURL to prevent mistakenly used toString(). This fixes also the update link bug.	11 years ago
reger	a2cb366b25	Combine /heuristic search modifier with opensearch configured targets - with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid) - this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches - the index.html searchoption text adjusted to be displayed only if option configured - add Archive-It to predefined systems	11 years ago
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	11 years ago
Michael Peter Christen	e039e78210	small bugfixes	11 years ago
Michael Peter Christen	32a2ff925c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	d07cdd8c3b	added SolrCloud access mode and configuration	11 years ago
Michael Peter Christen	8514bffc22	enhanced postprocessing status report	11 years ago
reger	b24572f304	fix GSA filter query assignment - use more parameter constants	11 years ago
Michael Peter Christen	b5fc2b63ea	removed exist() retrieval functions from error cache and replaced it with metadata retrieval from connectors directly. This should cause better usage of the cache. Automatically increase the metadata cache if more memory is available.	11 years ago
Michael Peter Christen	dd5cdfe212	reverted filter query hack, it did not work	11 years ago
Michael Peter Christen	5326970d6c	enhanced solr queries for single document extraction	11 years ago
Michael Peter Christen	525575bd97	added debugging of filter queries in thread dump thread names	11 years ago
Michael Peter Christen	f319ef268f	testing filter queries instead of queries to retrieve documents by id	11 years ago
Michael Peter Christen	fd87fa1613	removed more unnecessary exist-checks in ErrorCache	11 years ago
Michael Peter Christen	a694b6a8fc	another fix for unique field computation	11 years ago
Michael Peter Christen	fb3dd56b02	fix for processing of noindex flag in http header	11 years ago

1 2 3 4 5 ...

929 Commits (6491270b3a17a26834956d9aaf396b211e0d6b2b)