yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	49cab2b85f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	0d58fea210	made multiple connector default	13 years ago
Michael Peter Christen	7740c02c56	- enhanced the solr connector - added new multiple connector (to replace singleConnector)	13 years ago
Michael Peter Christen	0cf3d36eae	more tolerance in case of corrupted file	13 years ago
Michael Peter Christen	acc6db28ff	added missing classes for solr interface	13 years ago
Michael Peter Christen	adeb33bb36	better abstraction for solr objects	13 years ago
Michael Peter Christen	8864141872	more abstraction in solr connection classes	13 years ago
Michael Peter Christen	c00efc2717	made the solr connection more generic	13 years ago
Michael Peter Christen	ea2bd43b28	patch for broken configurations	13 years ago
Michael Peter Christen	e5ca7f22b1	enhancement in circle drawing	13 years ago
Michael Peter Christen	34f4225d7e	less 'wellformed' calls without asserts	13 years ago
Marc Nause	a691023d04	) better formatting for network QPM ) refactoring	13 years ago
Michael Peter Christen	77f8e9fb9b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	2a0434efa4	Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'	13 years ago
Michael Peter Christen	942896fe46	removed methods not supported by new solrj connector for httpclient 4 Error was: java.lang.UnsupportedOperationException: Client was created outside of HttpSolrServer at org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614) at net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128) at net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55) at net.yacy.search.Switchboard.<init>(Switchboard.java:657) at net.yacy.yacy.startup(yacy.java:222) at net.yacy.yacy.main(yacy.java:1018)	13 years ago
Michael Peter Christen	22e1f68c0b	solrj user authentication patch	13 years ago
Michael Peter Christen	09484955dc	added new entry class for embed tags	13 years ago
Michael Peter Christen	62f2554a01	- fixed build problems (deprecated methods using httpclient 3.1) - removed httpclient 3.1 lib which was used by solrj (solrj now uses httpclient 4)	13 years ago
Michael Peter Christen	a6d60fc21f	concurrency enhancement in ConfigurationSet	13 years ago
Michael Peter Christen	453010bd68	- solved problems with backpath normalization - redesigned in/outbound link handover - removed iframe links from inbound/outbound in solr scheme	13 years ago
Michael Peter Christen	5f5ed33ed8	patch for media search (audio, video apps)	13 years ago
Michael Peter Christen	7860c1df80	fix needed for new solrj library	13 years ago
Michael Peter Christen	0e13022147	- enhanced solr field documentation - added xml api button to IndexFederated_p - the solr schema.xml file can be generated by YaCy	13 years ago
Michael Peter Christen	19efbf1b0f	- apply directDocByURL to NOLOAD Queue - choose pushing to NOLOAD as default for site crawl	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	a3badd3205	changed search process for images: no more media snippet load process, show only links from index which had been on the text search page before. This creates a superfast search process for images!	13 years ago
Michael Peter Christen	f5efdb21fd	refactoring	13 years ago
reger	c1f6b4fb52	lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	8a08c96a82	removed dependency from logging	13 years ago
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	13 years ago
Michael Peter Christen	a5d7da68a0	refactoring: removed dependency from switchboard in Balancer/CrawlQueues	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	4d5da75814	fix for parser problem if a <a>-tag is 'within' html tags with unclosed tags. That prevented the <a> tags from beeing recognized. This is a fix for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516	13 years ago
Michael Peter Christen	91a86f0b06	fixed to network graph testing	13 years ago
Michael Peter Christen	7b5b9baee0	added citation rank to ranking profile	13 years ago
Michael Peter Christen	046f3a7e8d	check if httpc has decompressed the release file and rename the file from .tar.gz to .tar if that happened	13 years ago
Michael Christen	02e4dedff2	fix to url citation collection	13 years ago
Michael Christen	e32055aa15	added stub classes for - a new database for url reference data ('seen links') - a new database extending the references to the full url metadata attributes set which shall replace the old metadata database if it is finished - migration help classes stub to use old and new metadata databases simultanously	13 years ago
Michael Christen	ac5d124ee0	experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable)	13 years ago
Michael Christen	8fc86fe397	added storage of full anchor link structure: the links between all pages are now stored. The same index structure as used for the word index is used to make a reverse link index. The new file(s) in SEGMENT/default/citation.index.*.blob store the citation index. This will be used to create much more detailed link structures for the YaCy apis and to create a better ranking. A ranking using the citation.index should provide better results especially for portal indexes and initranets.	13 years ago
Michael Christen	22f05c83ff	fixed default must-match filter for full domain crawls - the old filter was to restrictive and did not allow intranet crawls	13 years ago
Lotus	0b3f39136e	allow custom ppm lower than minimum button on /Crawler_p.html fixes http://bugs.yacy.net/view.php?id=166	13 years ago
Michael Peter Christen	532c7cf827	added physics experiment to the graph plotter. not active by default	13 years ago
Michael Peter Christen	aba9b1bfa0	better names for elements of a linked graph	13 years ago
Michael Peter Christen	0cc0290978	bugfix for a must-not-match pattern check. This bug did not make the check semantically wrong, but a trick that prevented an IP lookup in case that the filter was not used did not work. That bugfix causes that crawling gets a huge speed boost for noload urls!	13 years ago
Michael Peter Christen	2fc8ecee36	ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html and the following test programm: public class QueueLengthTimeTest { public static long countTest(Queue<Integer> q, int c) { long t = System.currentTimeMillis(); for (int i = 0; i < c; i++) { q.add(q.size()); } return System.currentTimeMillis() - t; } public static void main(String[] args) { int c = 1; for (int i = 0; i < 100; i++) { Runtime.getRuntime().gc(); long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c); Runtime.getRuntime().gc(); long t2 = countTest(new LinkedBlockingQueue<Integer>(), c); Runtime.getRuntime().gc(); long t3 = countTest(new ConcurrentLinkedQueue<Integer>(), c); System.out.println("count = " + c + ": ArrayBlockingQueue = " + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = " + t3); c = c * 2; } } }	13 years ago
Michael Peter Christen	8aba045ba1	if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given.	13 years ago

1 2 3 4 5 ...

5429 Commits (49cab2b85f58542aeac6b8b4f19b3df595a87bbf)