yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	77f8e9fb9b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	2a0434efa4	Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'	13 years ago
Michael Peter Christen	942896fe46	removed methods not supported by new solrj connector for httpclient 4 Error was: java.lang.UnsupportedOperationException: Client was created outside of HttpSolrServer at org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614) at net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128) at net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55) at net.yacy.search.Switchboard.<init>(Switchboard.java:657) at net.yacy.yacy.startup(yacy.java:222) at net.yacy.yacy.main(yacy.java:1018)	13 years ago
Michael Peter Christen	22e1f68c0b	solrj user authentication patch	13 years ago
Michael Peter Christen	09484955dc	added new entry class for embed tags	13 years ago
Michael Peter Christen	62f2554a01	- fixed build problems (deprecated methods using httpclient 3.1) - removed httpclient 3.1 lib which was used by solrj (solrj now uses httpclient 4)	13 years ago
Michael Peter Christen	a6d60fc21f	concurrency enhancement in ConfigurationSet	13 years ago
Michael Peter Christen	453010bd68	- solved problems with backpath normalization - redesigned in/outbound link handover - removed iframe links from inbound/outbound in solr scheme	13 years ago
Michael Peter Christen	5f5ed33ed8	patch for media search (audio, video apps)	13 years ago
Michael Peter Christen	7860c1df80	fix needed for new solrj library	13 years ago
Michael Peter Christen	0e13022147	- enhanced solr field documentation - added xml api button to IndexFederated_p - the solr schema.xml file can be generated by YaCy	13 years ago
Michael Peter Christen	19efbf1b0f	- apply directDocByURL to NOLOAD Queue - choose pushing to NOLOAD as default for site crawl	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	a3badd3205	changed search process for images: no more media snippet load process, show only links from index which had been on the text search page before. This creates a superfast search process for images!	13 years ago
reger	c1f6b4fb52	lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	8a08c96a82	removed dependency from logging	13 years ago
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	4d5da75814	fix for parser problem if a <a>-tag is 'within' html tags with unclosed tags. That prevented the <a> tags from beeing recognized. This is a fix for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516	13 years ago
Michael Peter Christen	91a86f0b06	fixed to network graph testing	13 years ago
Michael Peter Christen	7b5b9baee0	added citation rank to ranking profile	13 years ago
Michael Peter Christen	046f3a7e8d	check if httpc has decompressed the release file and rename the file from .tar.gz to .tar if that happened	13 years ago
Michael Christen	02e4dedff2	fix to url citation collection	13 years ago
Michael Christen	e32055aa15	added stub classes for - a new database for url reference data ('seen links') - a new database extending the references to the full url metadata attributes set which shall replace the old metadata database if it is finished - migration help classes stub to use old and new metadata databases simultanously	13 years ago
Michael Christen	ac5d124ee0	experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable)	13 years ago
Michael Christen	8fc86fe397	added storage of full anchor link structure: the links between all pages are now stored. The same index structure as used for the word index is used to make a reverse link index. The new file(s) in SEGMENT/default/citation.index.*.blob store the citation index. This will be used to create much more detailed link structures for the YaCy apis and to create a better ranking. A ranking using the citation.index should provide better results especially for portal indexes and initranets.	13 years ago
Lotus	0b3f39136e	allow custom ppm lower than minimum button on /Crawler_p.html fixes http://bugs.yacy.net/view.php?id=166	13 years ago
Michael Peter Christen	532c7cf827	added physics experiment to the graph plotter. not active by default	13 years ago
Michael Peter Christen	aba9b1bfa0	better names for elements of a linked graph	13 years ago
Michael Peter Christen	2fc8ecee36	ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html and the following test programm: public class QueueLengthTimeTest { public static long countTest(Queue<Integer> q, int c) { long t = System.currentTimeMillis(); for (int i = 0; i < c; i++) { q.add(q.size()); } return System.currentTimeMillis() - t; } public static void main(String[] args) { int c = 1; for (int i = 0; i < 100; i++) { Runtime.getRuntime().gc(); long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c); Runtime.getRuntime().gc(); long t2 = countTest(new LinkedBlockingQueue<Integer>(), c); Runtime.getRuntime().gc(); long t3 = countTest(new ConcurrentLinkedQueue<Integer>(), c); System.out.println("count = " + c + ": ArrayBlockingQueue = " + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = " + t3); c = c * 2; } } }	13 years ago
Michael Peter Christen	8aba045ba1	if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given.	13 years ago
Michael Peter Christen	8c06925984	animation of the web structure picture	13 years ago
Michael Peter Christen	898fa7c3f3	use tld heuristic to check if a domain is local or global	13 years ago
Michael Peter Christen	213c8d97f2	use less proccesses in process pool	13 years ago
Michael Peter Christen	c639248c23	protection against strange answers from remote peers during search	13 years ago
Michael Peter Christen	36e4d82b27	changed ranking	13 years ago
Michael Peter Christen	096c17e7cd	added test code	13 years ago
Michael Peter Christen	665626a51b	catch OOM errors during scanning	13 years ago
Michael Peter Christen	1cd711d005	added classes for citation references (for new citation ranking)	13 years ago
Michael Peter Christen	33a405dab8	ipv6 bugfix	13 years ago
Michael Peter Christen	c6c61be3f0	fix for http://bugs.yacy.net/view.php?id=148	13 years ago
Michael Peter Christen	e0f1e7d904	added new citation reference data structure that shall be used for a citation ranking	13 years ago
Michael Peter Christen	e18a4f6b74	more tolerant merge iterator	13 years ago
Michael Peter Christen	e101c2e0e2	added changes from copperdust (submitted by email): 1. Improved and fixed language detection: 1.1 Identificator.java - recognition fix (improved) 1.2 DCEntry.java - fix (changed detection order due to detection from tld in many cases is incorrect) 1.3 MultiProtocolURI.java - fixed and enhanced language from tld detection (all currently used top-level domains; ccTLD added but not tested). 2. Ukrainian language update. 3. Main Slavic languages langstats (tested and works fine).	13 years ago
Michael Peter Christen	8d63a5887c	bugfixes	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
Michael Peter Christen	7e4e3fe5b6	free some memory after parsing html	13 years ago

1 2 3 4 5 ...

1004 Commits (dcccbe0be875e392202a20da090726b7198f8792)