yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	10da7335ea	performance hack: use a hash cache for all hashes that are computed by a byte array. If this hash is used in a HashMap (which is very often the case) then this hack eliminates a lot of re-computations of the same hash.	13 years ago
Michael Peter Christen	f8a0cf6d7c	RSSMessages do not need a concurrent hash map -> removed overhead	13 years ago
Michael Peter Christen	07ca7e4dd1	enhanced RSS parsing by ensuring that it is parsed with a buffered input stream	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
Michael Peter Christen	8d997d55b6	better logging	13 years ago
Michael Peter Christen	65d37e6a20	only ASCII needed in seed bitflags	13 years ago
Michael Peter Christen	0f82fb3628	using double instead float for a better release ordering	13 years ago
Michael Peter Christen	43c2c6e588	better logging	13 years ago
sixcooler	56087c1f23	bump to httpclient- httpcore-, httpmime- 4.2	13 years ago
Michael Peter Christen	71c3163f3d	- fixes to node identification - added link to node in network list - added marking of portal search node peers	13 years ago
Michael Peter Christen	4d3cc02168	replaced old bzip2 library against better documented commons-compress package from http://commons.apache.org/compress/	13 years ago
Michael Peter Christen	ad222be7f8	added node state icon in network list	13 years ago
Michael Peter Christen	3c2bec681f	added a root node flag: identifies peers with short ping time	13 years ago
Michael Peter Christen	c846e9ca14	redesign of the crawler monitor page: show crawled pages instead of queue of urls that shall be crawled	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	13 years ago
Michael Peter Christen	7e0ddbd275	added a "fromCache" flag in Response object to omit one cache.has() check during snippet generation. This should cause less blockings	13 years ago
Michael Peter Christen	81737dcb18	removed stack trace from swf parser since we cant do anything there	13 years ago
Michael Peter Christen	7bf421b9dd	- fixed image search page navigation - removed some deadlocks and ConcurrentModificationExceptions during DidYouMean collection	13 years ago
Michael Peter Christen	c6a09eab0b	synchronization needed	13 years ago
Michael Peter Christen	fb94b47b1a	changed queue sizes to have less memory occupied during indexing	13 years ago
Michael Peter Christen	76157dc2c3	bugfix for http://bugs.yacy.net/view.php?id=173	13 years ago
reger	6696cb1313	bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset. Changed the index init to insert lowercase peer names as key	13 years ago
Michael Peter Christen	c6558cba08	more classification bugs	13 years ago
Michael Peter Christen	082831b9d6	search contentdom was checked in wrong way - fixed	13 years ago
reger	ee553d971e	correct typo in scripts_txt comment	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago
Michael Peter Christen	acf8d521a2	fix for http://bugs.yacy.net/view.php?id=126	13 years ago
Michael Peter Christen	bb88878b4d	the last commit was incomplete..	13 years ago
Michael Peter Christen	d320a31ae1	bugfix for http://bugs.yacy.net/view.php?id=186	13 years ago
Michael Peter Christen	fa735f4f04	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	3e1bc9477f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	6f8a2fef1f	small speed enhancement using a column factory	13 years ago
Roland 'Quix0r' Haeder	d10627d591	More sync in close() methods Conflicts: source/net/yacy/kelondro/logging/GuiHandler.java source/net/yacy/kelondro/workflow/InstantBusyThread.java	13 years ago
Roland 'Quix0r' Haeder	b3ae2aa41f	With or without 'final'? At least please try it in other methods Conflicts: source/de/anomic/tools/tarTools.java	13 years ago
Roland 'Quix0r' Haeder	fbb946f913	Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile	13 years ago
Michael Peter Christen	52d307c735	prevent that the snippet fectch process removes catchall entries	13 years ago
Michael Peter Christen	7eece0256f	moved yacy.logging to defaults according to request in http://bugs.yacy.net/view.php?id=55	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
Michael Peter Christen	5deebd02ea	added serialization	13 years ago
reger	b2175ea4ef	Add possibility to set custom Solr field names for the YaCy default Solr attributes. - Changing the format of YaCy's solr.key.list while maintainig backward compatibility Federated index config screens adjusted accordingly - modified the Solr update request to use a 3 min Solr autocommit intervall	13 years ago
Michael Peter Christen	15db703808	added missing serialization to remove all warnings	13 years ago
Michael Peter Christen	1795a7325b	made HandleSet serializable	13 years ago
Michael Peter Christen	e7e381d110	added configuration to switch off redirection following in crawler	13 years ago
Michael Peter Christen	2717c1b749	fixed bug in solr interface	13 years ago
Michael Peter Christen	f150bc218b	fixed bug in solr error document	13 years ago
Michael Peter Christen	cb54c1737b	solrj connector bugfix	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	49cab2b85f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	0d58fea210	made multiple connector default	13 years ago
Michael Peter Christen	7740c02c56	- enhanced the solr connector - added new multiple connector (to replace singleConnector)	13 years ago
Michael Peter Christen	0cf3d36eae	more tolerance in case of corrupted file	13 years ago
Michael Peter Christen	acc6db28ff	added missing classes for solr interface	13 years ago
Michael Peter Christen	adeb33bb36	better abstraction for solr objects	13 years ago
Michael Peter Christen	8864141872	more abstraction in solr connection classes	13 years ago
Michael Peter Christen	c00efc2717	made the solr connection more generic	13 years ago
Michael Peter Christen	ea2bd43b28	patch for broken configurations	13 years ago
Michael Peter Christen	e5ca7f22b1	enhancement in circle drawing	13 years ago
Michael Peter Christen	34f4225d7e	less 'wellformed' calls without asserts	13 years ago
Marc Nause	a691023d04	) better formatting for network QPM ) refactoring	13 years ago
Michael Peter Christen	77f8e9fb9b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	2a0434efa4	Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'	13 years ago
Michael Peter Christen	942896fe46	removed methods not supported by new solrj connector for httpclient 4 Error was: java.lang.UnsupportedOperationException: Client was created outside of HttpSolrServer at org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614) at net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128) at net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55) at net.yacy.search.Switchboard.<init>(Switchboard.java:657) at net.yacy.yacy.startup(yacy.java:222) at net.yacy.yacy.main(yacy.java:1018)	13 years ago
Michael Peter Christen	22e1f68c0b	solrj user authentication patch	13 years ago
Michael Peter Christen	09484955dc	added new entry class for embed tags	13 years ago
Michael Peter Christen	62f2554a01	- fixed build problems (deprecated methods using httpclient 3.1) - removed httpclient 3.1 lib which was used by solrj (solrj now uses httpclient 4)	13 years ago
Michael Peter Christen	a6d60fc21f	concurrency enhancement in ConfigurationSet	13 years ago
Michael Peter Christen	453010bd68	- solved problems with backpath normalization - redesigned in/outbound link handover - removed iframe links from inbound/outbound in solr scheme	13 years ago
Michael Peter Christen	5f5ed33ed8	patch for media search (audio, video apps)	13 years ago
Michael Peter Christen	7860c1df80	fix needed for new solrj library	13 years ago
Michael Peter Christen	0e13022147	- enhanced solr field documentation - added xml api button to IndexFederated_p - the solr schema.xml file can be generated by YaCy	13 years ago
Michael Peter Christen	19efbf1b0f	- apply directDocByURL to NOLOAD Queue - choose pushing to NOLOAD as default for site crawl	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	a3badd3205	changed search process for images: no more media snippet load process, show only links from index which had been on the text search page before. This creates a superfast search process for images!	13 years ago
reger	c1f6b4fb52	lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	8a08c96a82	removed dependency from logging	13 years ago
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	4d5da75814	fix for parser problem if a <a>-tag is 'within' html tags with unclosed tags. That prevented the <a> tags from beeing recognized. This is a fix for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516	13 years ago
Michael Peter Christen	91a86f0b06	fixed to network graph testing	13 years ago
Michael Peter Christen	7b5b9baee0	added citation rank to ranking profile	13 years ago
Michael Peter Christen	046f3a7e8d	check if httpc has decompressed the release file and rename the file from .tar.gz to .tar if that happened	13 years ago
Michael Christen	02e4dedff2	fix to url citation collection	13 years ago
Michael Christen	e32055aa15	added stub classes for - a new database for url reference data ('seen links') - a new database extending the references to the full url metadata attributes set which shall replace the old metadata database if it is finished - migration help classes stub to use old and new metadata databases simultanously	13 years ago
Michael Christen	ac5d124ee0	experimental implementation of a citation ranking as post-ranking method. (ranking coefficient fixed, need to be made configurable)	13 years ago
Michael Christen	8fc86fe397	added storage of full anchor link structure: the links between all pages are now stored. The same index structure as used for the word index is used to make a reverse link index. The new file(s) in SEGMENT/default/citation.index.*.blob store the citation index. This will be used to create much more detailed link structures for the YaCy apis and to create a better ranking. A ranking using the citation.index should provide better results especially for portal indexes and initranets.	13 years ago
Lotus	0b3f39136e	allow custom ppm lower than minimum button on /Crawler_p.html fixes http://bugs.yacy.net/view.php?id=166	13 years ago
Michael Peter Christen	532c7cf827	added physics experiment to the graph plotter. not active by default	13 years ago
Michael Peter Christen	aba9b1bfa0	better names for elements of a linked graph	13 years ago
Michael Peter Christen	2fc8ecee36	ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html and the following test programm: public class QueueLengthTimeTest { public static long countTest(Queue<Integer> q, int c) { long t = System.currentTimeMillis(); for (int i = 0; i < c; i++) { q.add(q.size()); } return System.currentTimeMillis() - t; } public static void main(String[] args) { int c = 1; for (int i = 0; i < 100; i++) { Runtime.getRuntime().gc(); long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c); Runtime.getRuntime().gc(); long t2 = countTest(new LinkedBlockingQueue<Integer>(), c); Runtime.getRuntime().gc(); long t3 = countTest(new ConcurrentLinkedQueue<Integer>(), c); System.out.println("count = " + c + ": ArrayBlockingQueue = " + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = " + t3); c = c * 2; } } }	13 years ago
Michael Peter Christen	8aba045ba1	if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given.	13 years ago
Michael Peter Christen	8c06925984	animation of the web structure picture	13 years ago
Michael Peter Christen	898fa7c3f3	use tld heuristic to check if a domain is local or global	13 years ago
Michael Peter Christen	213c8d97f2	use less proccesses in process pool	13 years ago
Michael Peter Christen	c639248c23	protection against strange answers from remote peers during search	13 years ago
Michael Peter Christen	36e4d82b27	changed ranking	13 years ago
Michael Peter Christen	096c17e7cd	added test code	13 years ago
Michael Peter Christen	665626a51b	catch OOM errors during scanning	13 years ago
Michael Peter Christen	1cd711d005	added classes for citation references (for new citation ranking)	13 years ago
Michael Peter Christen	33a405dab8	ipv6 bugfix	13 years ago
Michael Peter Christen	c6c61be3f0	fix for http://bugs.yacy.net/view.php?id=148	13 years ago
Michael Peter Christen	e0f1e7d904	added new citation reference data structure that shall be used for a citation ranking	13 years ago
Michael Peter Christen	e18a4f6b74	more tolerant merge iterator	13 years ago
Michael Peter Christen	e101c2e0e2	added changes from copperdust (submitted by email): 1. Improved and fixed language detection: 1.1 Identificator.java - recognition fix (improved) 1.2 DCEntry.java - fix (changed detection order due to detection from tld in many cases is incorrect) 1.3 MultiProtocolURI.java - fixed and enhanced language from tld detection (all currently used top-level domains; ccTLD added but not tested). 2. Ukrainian language update. 3. Main Slavic languages langstats (tested and works fine).	13 years ago
Michael Peter Christen	8d63a5887c	bugfixes	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
Michael Peter Christen	7e4e3fe5b6	free some memory after parsing html	13 years ago
Michael Peter Christen	4540174fe0	memory hacks	13 years ago
Michael Peter Christen	b4409cc803	small redesign of blob column index and usage	13 years ago
Michael Peter Christen	d5c1f2746e	performance hack	13 years ago
Michael Peter Christen	803963aebd	performance hack: better space grow in CharBuffer (speeds up html parser)	13 years ago
Michael Peter Christen	8b0920b0b5	tried to fix the ipv6 problem as reported in bug but this did not solve all problems because a bug in the apache http client prevented that it worked. Thread dump: Caused by: java.lang.NumberFormatException: For input string: "1450:400c:c01:0:0:0:69" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:458) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.http.client.utils.URIUtils.extractHost(URIUtils.java:310) at org.apache.http.impl.client.AbstractHttpClient.determineTarget(AbstractHttpClient.java:764) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at net.yacy.cora.protocol.http.HTTPClient.execute(HTTPClient.java:597) at net.yacy.cora.protocol.http.HTTPClient.getContentBytes(HTTPClient.java:558) at net.yacy.cora.protocol.http.HTTPClient.GETbytes(HTTPClient.java:341) at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:131) at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:74) at net.yacy.repository.LoaderDispatcher.loadInternal(LoaderDispatcher.java:274) at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:164) at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:150) at net.yacy.repository.LoaderDispatcher.loadDocument(LoaderDispatcher.java:355) at getpageinfo_p.respond(getpageinfo_p.java:97)	13 years ago
Michael Peter Christen	e2f8f263e8	changed storage of search words: keep order	13 years ago
Michael Peter Christen	ed39ef2890	changed generation of protocol information	13 years ago
Michael Peter Christen	0b67a0a5d8	added a column index for tables in blob files. This is heavily used during receiving of DHT submissions and when answering remote search requests. Both events together may have caused IO-deadlocking and this commit shall fix that.	13 years ago
Michael Peter Christen	2e5cd6a1b2	fixed parser extension deny list generation and usage	13 years ago
Michael Peter Christen	8bee1472c9	there is no noindex, only nofollow in links	13 years ago
Michael Peter Christen	3cd6dcd352	do not add new solr fields as activated fields	13 years ago
Michael Peter Christen	e3bb73c3d6	serialized some database access methods	13 years ago
Michael Peter Christen	7e728867e5	added a synchronization around iterations to prevent IO-deadlocking during concurrent remote search requests	13 years ago
Michael Peter Christen	355ecf330f	reduced target file site to 64mb	13 years ago
Michael Peter Christen	10ae6d94a1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	2ea585d616	fix for host navigator	13 years ago
Michael Peter Christen	2f6dde92e2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	c560a582ac	fix for single-word vocabulary lines	13 years ago
Michael Peter Christen	4c5edab1ec	added option to have exception search result windows	13 years ago
Michael Peter Christen	046d7de95b	Merge remote branch 'reger/master'	13 years ago
reger	a95f645a61	Bugfix class repository.Loaddispatcher fixed download file limit of 10000 line 355: final Response response = this.load(request, cachePolicy, 10000, true);	13 years ago
Michael Peter Christen	ef78f22ee1	performance hack	13 years ago
Michael Peter Christen	41536eb4a2	performance hack	13 years ago
Michael Peter Christen	f91487fc50	added delete-button for host navigation	13 years ago
Michael Peter Christen	e8d24fd802	author navigator can be switched off	13 years ago
Michael Peter Christen	558ab7bd4e	made the protocol navigator reversible	13 years ago
Michael Peter Christen	96cb75f1d4	made the filetype navigator be able to deselect the search constraint	13 years ago
Michael Peter Christen	1f4f60654a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/document/parser/pdfParser.java	13 years ago
reger	32104360ce	PDFParser - return at least first 3 pages of PDF fix for pdf parsing without returning parsed text due to interruption by time out.	13 years ago
Michael Peter Christen	ef5192f8c9	using the generic document parser for crawl starts instead of the html parser. This makes it possible that every type of document can be a crawl start point, not only text documents or html documents. Testet this with a pdf document.	13 years ago
Michael Peter Christen	a02fdf8625	better error messages	13 years ago
Michael Peter Christen	eadb58dd87	small enhancements in pdf parser	13 years ago
Michael Peter Christen	c6ba44468e	timeout = 5000 instead 3000	13 years ago
reger	b616de5973	PDFParser - return at least first 3 pages of PDF fix for pdf parsing without returning parsed text due to interruption by time out.	13 years ago
Lotus	c73af39e54	refactoring of tray icon class, now uses Java 6 methods natively	13 years ago
Michael Peter Christen	4eff0e26f1	npe bugfix	13 years ago
low012	8776b84c10	*) small fix to make password change function of reconfigureYACY.sh work again	13 years ago
Michael Peter Christen	1a0b6b3913	get more navigation details to search results	13 years ago
Michael Peter Christen	7f9b6b7a0c	added switches to ConfigParser to accept/deny documents by their extension	13 years ago
Michael Peter Christen	4901cee3cc	suppress auto-tagged subject entries when sending out or receiving metadata from other peers	13 years ago

1 2 3 4 5 ...

1164 Commits (8cf47a83350bb000cf6b576c697ba2255c0d5df6)