yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	9116013c64	- allow lazy initialization of solr value (if using 'lazy', then no 0-values and no empty strings are written). This may save a lot of memory (in ram and on disc) if excessive 0-values or empty strings appear) - do not allow default boolean values for checkboxes because that does not make sense: browsers may omit the checkbox attribute name if the box is not checked. A default value 'true' would not comply with the semantic of the browsers response. - add a checkbox in IndexFederated_p for the lazy initialization of solr fields.	13 years ago
Michael Peter Christen	3fd4a01286	added option to record urls that are forwarded to the solr index	13 years ago
Michael Peter Christen	d763e4d94b	fixed bad referer computation in SSIs which causes a NPE during host computation. This error was there before the latest IPv6 hack but did not cause a NPE. The IPv6 hack was not the cause for this bug, but it discovered the misconfiguration of the 'referer' referrer.	13 years ago
Michael Peter Christen	96aeb127e3	generalized localhost naming. this is also a preparation for a better IPv6 implementation.	13 years ago
Michael Peter Christen	77f795756c	fixing redirects and status codes: storing of status code in ResponseHeader to make it available for late evaluations, like storage in solr.	13 years ago
Michael Peter Christen	b9dfca4b0a	- fixed IndexFederated Servlet / a embedded Solr can now be selected - added code stub for an embedded Solr but generation of Solr store is still commented out (it works but is not yet ready for usage)	13 years ago
Michael Peter Christen	a5eb91fa60	refactoring	13 years ago
Michael Peter Christen	de3ef8ad73	removed unimportant warnings	13 years ago
cominch	e4555cbee3	Augmented browsing: Pass on additional action parameter	13 years ago
Michael Peter Christen	8e97ada7c9	IPv6 bugfix	13 years ago
Michael Peter Christen	963f92ed9a	- merged files - changed behaviour of delete button in vocabulary edit - fixed size numbe in vocabulary listing	13 years ago
Michael Peter Christen	fbded1f466	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	df3531f8d5	added the generation of virtual vocabularies using the pnd	13 years ago
Michael Peter Christen	e806106b10	jquery bugfix	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
cominch	7a4dab6d1d	- removed unused variables - do not replace malformed or invalid URLs in urlproxy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835 6c8d7289-2bf4-0310-a012-ef5d649a1542 Conflicts: source/de/anomic/http/server/HTTPDFileHandler.java	13 years ago
Michael Peter Christen	ca93835713	removed usage of deprecated methods	13 years ago
cominch	6b32f7c1f6	re-enable augmented proxy	13 years ago
cominch	b5a8fb5fd8	Catch malformed URL when submitted in encoded style	13 years ago
cominch	8e80894812	create virtual web folder /currentyacypeer/ which always points to local peer, even when using the urlproxy Conflicts: source/de/anomic/http/server/HTTPDProxyHandler.java	13 years ago
cominch	ae8adb0e58	Small changes	13 years ago
cominch	b21048892b	augmentedParser add features and integrate external html parser to modify existing web pages Conflicts: addon/YaCy.app/Contents/Info.plist build.xml	13 years ago
cominch	9cbfc1a1c0	augmentedProxy, which forwards every proxy request to a rewrite engine to customize existing webpages. originally implemented by Florian Richter. Conflicts: source/de/anomic/http/server/HTTPDProxyHandler.java	13 years ago
Michael Peter Christen	b0095c8d3c	flush the compressor cache when a cleanup is done	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago
Michael Peter Christen	3dd8376825	added automatic cleaning of cache if metadata and file database size is not equal. It might happen that these data is different because one of that caches is cleaned after a while or when it is too big. The metadata is then not cleaned, but now wiped after a checkup process at every application start. This should cause a bit less memory usage.	13 years ago
Michael Peter Christen	461a0ce052	removed warnings	13 years ago
Michael Peter Christen	407fdf6968	more bug fixes and performance hacks for search process	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	2fe207f813	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	0284a4d88f	more fixes for double precision of coordinates	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	43c2c6e588	better logging	13 years ago
Michael Peter Christen	20e0cc0822	fix for bad location evaluation	13 years ago
Michael Peter Christen	eff7667554	fix for http://bugs.yacy.net/view.php?id=188	13 years ago
Michael Peter Christen	8b974905ee	changed log-in text for all servlets with authentication: - added hint how to set the password using a shell script - added a shell script to change the password	13 years ago
Michael Peter Christen	16b21f7a5b	Added more steering in Crawler_p.html interface	13 years ago
Michael Peter Christen	acc19e190d	hack against 100% cpu during crawl delete	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	13 years ago
Michael Peter Christen	7e0ddbd275	added a "fromCache" flag in Response object to omit one cache.has() check during snippet generation. This should cause less blockings	13 years ago
Michael Peter Christen	125d47b3c1	added more interruptions in DidYouMean because that was the cause for some blockings during search	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago
Michael Peter Christen	3e1bc9477f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Roland 'Quix0r' Haeder	b3ae2aa41f	With or without 'final'? At least please try it in other methods Conflicts: source/de/anomic/tools/tarTools.java	13 years ago
Michael Peter Christen	5b3acc12cd	Pattern.quote() replaces \\Q and \\E according to publication in http://www.cs.washington.edu/homes/mernst/pubs/regex-types-ftfjp2012.pdf	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
Michael Peter Christen	e7e381d110	added configuration to switch off redirection following in crawler	13 years ago
Michael Peter Christen	70505107ca	enhanced crawler/balancer: better remaining waiting-time guessing	13 years ago
Michael Peter Christen	f150bc218b	fixed bug in solr error document	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	13 years ago
Michael Peter Christen	f5efdb21fd	refactoring	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	13 years ago
Michael Peter Christen	a5d7da68a0	refactoring: removed dependency from switchboard in Balancer/CrawlQueues	13 years ago
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	13 years ago
Michael Peter Christen	046f3a7e8d	check if httpc has decompressed the release file and rename the file from .tar.gz to .tar if that happened	13 years ago
Michael Christen	22f05c83ff	fixed default must-match filter for full domain crawls - the old filter was to restrictive and did not allow intranet crawls	13 years ago
Michael Peter Christen	0cc0290978	bugfix for a must-not-match pattern check. This bug did not make the check semantically wrong, but a trick that prevented an IP lookup in case that the filter was not used did not work. That bugfix causes that crawling gets a huge speed boost for noload urls!	13 years ago
Michael Peter Christen	2fc8ecee36	ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html and the following test programm: public class QueueLengthTimeTest { public static long countTest(Queue<Integer> q, int c) { long t = System.currentTimeMillis(); for (int i = 0; i < c; i++) { q.add(q.size()); } return System.currentTimeMillis() - t; } public static void main(String[] args) { int c = 1; for (int i = 0; i < 100; i++) { Runtime.getRuntime().gc(); long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c); Runtime.getRuntime().gc(); long t2 = countTest(new LinkedBlockingQueue<Integer>(), c); Runtime.getRuntime().gc(); long t3 = countTest(new ConcurrentLinkedQueue<Integer>(), c); System.out.println("count = " + c + ": ArrayBlockingQueue = " + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = " + t3); c = c * 2; } } }	13 years ago
Michael Peter Christen	8aba045ba1	if a new pop-up page is set in config portal, then this page applies also to the default page configuration for the httpd if no path is given.	13 years ago
Michael Peter Christen	c6c61be3f0	fix for http://bugs.yacy.net/view.php?id=148	13 years ago
Michael Peter Christen	0d148c3353	more logging in resource observer	13 years ago
Michael Peter Christen	2fa037ae1d	enhanced crawler	13 years ago
low012	2120db289a	*) Small change which should solve problem with cgitb module in Python CGI scripts.	13 years ago
Lotus	ee89cf5ae5	fix must match filter for full domain crawl allow: http://www.example.com http://www.example.com/ http://www.example.com/abc.html?xyz=q block: http://www.example.com.cn http://www.example.com.cn/dsf	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
Michael Peter Christen	4540174fe0	memory hacks	13 years ago
Michael Peter Christen	9ebcae2fbc	enhanced url parser to understand urls with & instead of & in post urls	13 years ago
Michael Peter Christen	1f4f60654a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/document/parser/pdfParser.java	13 years ago
Michael Peter Christen	e6d26a023f	fix for bookmark crash with possible side-effects on crawl start after the crash	13 years ago
Michael Peter Christen	190b77c55e	added Ukrainian translation	13 years ago
Marek Otahal	72adbeae90	!Important: move from Hashtable to HashMap Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue, but I found notices that some (ugly big) helper classes had to be created in past to compensate missing Hashtable's functionality. I'd like input if we can remove some of them. look for //FIX: if these commits Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Marek Otahal	c1af123ddd	just a little faster toString Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Marek Otahal	64e4bcee82	serverSwitch get(App/Data)Path() use common helper method Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Marek Otahal	371fbb4deb	just comment + shorter code in serverSwitch Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Marek Otahal	ed253b7aff	update javadoc, does not throw IOException Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Michael Peter Christen	2ee8cbeb2c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/search/Switchboard.java	13 years ago
Michael Peter Christen	992dbdf4bb	added noload statistic to servlets	13 years ago
Michael Christen	354b976110	fix for concurrency problem and endless loop in /suggest.json	13 years ago
Michael Christen	c21966bb43	fix	13 years ago
Michael Christen	13b05f9c08	fix	13 years ago
Michael Christen	e5d878c59e	Merge branch 'master' of ssh://gitorious.org/yacy/rc1 Conflicts: source/de/anomic/crawler/CrawlQueues.java	13 years ago
Michael Christen	ec26b2bea4	Merge commit 'fa08ed5ae5d72bddc3cc6a662b23103579e86109' into quix0r Conflicts: source/de/anomic/crawler/CrawlQueues.java	13 years ago
Michael Christen	216a287a85	Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r Conflicts: source/de/anomic/crawler/CrawlQueues.java	13 years ago
stbrumm	d18095dc48	Patch fuer Issue 0000102 and fixes to Patch (private peer status is a property of a peer, not a status)	13 years ago
Roland 'Quix0r' Haeder	901f37d608	Also this ... :( #2	13 years ago
Roland 'Quix0r' Haeder	a985717ed2	Also this ... :(	13 years ago
Roland 'Quix0r' Haeder	5f490de554	Fix for ported fix from my old days ...	13 years ago
Roland 'Quix0r' Haeder	fa08ed5ae5	Fixed a lot CHMOD rights (no need for execute flag on .java/.html) and introduced local/remote crawl size ratio based check	13 years ago
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	17f962fceb	translator updates: - config string for chinese - do not copy the language file to DATA/LOCALE any more (and do not use them there, this is really confusing for new translators)	13 years ago
Michael Christen	752b092b8a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
admin	23afee58fe	Merge branch 'master' of git://github.com/f1ori/yacy	13 years ago
Michael Christen	3eccdca63c	protection against too long running snippet fetch processes	13 years ago
apfelmaennchen	ff19fcdb28	bugfix for YMarks XBEL import and export; thanks to Dominic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8138 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago

1 2 3 4 5 ...

4841 Commits (6d03433cda9160ac26da8490d7b13fdac24e1b91)