yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	c351e47a84	fix for bad-formatted lonlat	11 years ago
reger	4c603b216e	optimize parse ServerSideInclude	11 years ago
orbiter	5ec0c969c9	fix for http://bugs.yacy.net/view.php?id=354	11 years ago
orbiter	0002abd583	fix for OOM during remote search and too high load protection	11 years ago
sixcooler	5a917e13c6	use less ram on dht-URL transfer by not using a URIMetadataNode[]	11 years ago
Michael Peter Christen	c87cdfca2e	do not set a load prerequisite that prevents the start of one-time-jobs	11 years ago
sixcooler	0512e46c6a	bump to httpclient-4.3.2	11 years ago
sixcooler	4d77ca52c9	workaround to let dht-out run on smal Systems like a Pi	11 years ago
reger	9a96a7d73f	put list quick navigator buttons belowon BlackList_p editor replacing the dropdown -> go navigation	11 years ago
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	11 years ago
Michael Peter Christen	489c3fbc90	code simplifications / removed warnings	11 years ago
Michael Peter Christen	0168f80c28	new crawling factors can now be changed during runtime	11 years ago
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	11 years ago
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	11 years ago
sixcooler	6d8c023a5e	lower client-connection for single-cpu-systems	11 years ago
Michael Peter Christen	77531850b5	reverted crawling strategy from latest commit.	11 years ago
Michael Peter Christen	c0da966dfa	enhanced crawler speed	11 years ago
Michael Peter Christen	79809342fa	added synchronization to exists() call bacause the concurrent call to that method showed in thread dump close to deadlock situations. Its also better to synchronize IO operations because they become faster then.	11 years ago
Michael Peter Christen	9a6912f2e6	if a http client thread is still running but we do not wait for it any more, call an interrupt	11 years ago
Michael Peter Christen	0d235a565b	cleanup crawl loader jobs	11 years ago
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	11 years ago
reger	d3de309953	fix IOexception logging issue in DefaultServlet reason not sure but .logException triggers another exception	11 years ago
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	11 years ago
reger	d24a0ec32c	upd heuristic default list (heuristicopensearch.conf) - Faroo Web taken out (requires api key) http://www.faroo.com/hp/api/api.html#description - update Faroo News to new url - Twitter taken out (change to Api 1.1 not supporting rss) https://dev.twitter.com/discussions/24239	11 years ago
Michael Peter Christen	022c6d3ce1	do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked.	11 years ago
Michael Peter Christen	42f3733a05	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
Michael Peter Christen	25a6c05008	experimental removal of synchronization. This should work for all cases where the size() and isEmpty() method is used only for statistics, which happens at many locations in YaCy. If these methods are used for structual reasons (like accessing the last element in an array) then it may fail or cause other problems. As far as visible, this is not the case.	11 years ago
Michael Peter Christen	5695280edd	removed superfluous synchronization	11 years ago
Michael Peter Christen	a1977b7a75	removed debug code	11 years ago
orbiter	fd4abc0565	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	d5b8e473c8	added load limit for DHT transfer: RWI acceptance only if local load is not too high	11 years ago
reger	41c126978b	fix bug: Crawl Start (Expert) crawls "?-URLs" even if told not to do so http://bugs.yacy.net/view.php?id=329	11 years ago
reger	2614fa7aeb	Skip remote Solr search if last try showed error As the solr servlet may not be available (e.g. no public search page, old version, individual access setting) a /solr/select error is remembered in the seed.dna of the remote peer. This is not permanent, as flag is not stored and the seed is reloaded on several occasions, it is just a memory of the recent past status. Might also be set to "not available" on time-out of last try.	11 years ago
orbiter	a07e9b3582	concurrency-solid version of transmission limitation	11 years ago
orbiter	ec21f0494e	removed -d64 jvm option because that causes problems on non-64 bit linux, see http://bugs.yacy.net/view.php?id=349 and http://bugs.yacy.net/view.php?id=339	11 years ago
orbiter	60ead31273	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	52bf7d1ac8	reduce load during dht transfer	11 years ago
sixcooler	f0587d4af5	NP-fix, which was found on a Pi under 'havy' load	11 years ago
Michael Peter Christen	a9ed28c0b5	no commit if no action is requested	11 years ago
Michael Peter Christen	0bf3cab8c7	- better 'extra'-peer selection - logging of health status for 'extra'-peer selection - concurrency for remote peer IO and interrupting the threads if time-out occurrs	11 years ago
orbiter	e3c4456c8e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	11 years ago
orbiter	7f21d21d1d	added synchronization to deeply-embedded solr connector EmbeddedSolrConnector because deadlock situations show that methods in lucene class seem to block.	11 years ago
reger	9b06774414	fix role name in GSA servlet	11 years ago
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	11 years ago
Michael Peter Christen	ba44eb1160	when scaling the number of remote peers, also consider the machine load and the number of cores	11 years ago
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	11 years ago
Michael Peter Christen	47a82e471c	less blocking in SeedDB which caused deadlocks in peer ping	11 years ago
Michael Peter Christen	ec10ed45bd	better logging in logger	11 years ago
Michael Peter Christen	a5d7961812	replaced old caching in SolrConnector with a new one which is better for concurrency and should prevent from 100% CPU usage after a long run of a peer with a large number of documents.	11 years ago
Michael Peter Christen	84cf7e8e9f	backmigration from solrj 4.6.0 to 4.5.1. This is necessary because solrj.4.6.0 has a bug which prevents the attachment of a remote solr (as tested with a SolrCloud). See bug report https://issues.apache.org/jira/browse/SOLR-5532 This bug shall be fixed in Solr 4.6.1. Fortunately, solrj-4.5.1 works together with solr-4.6.0 thus the current index does not need to be changed.	11 years ago

... 4 5 6 7 8 ...

10582 Commits (d1091e79f83591502fdc08444aca84b733300a71) All Branches Search

10582 Commits (d1091e79f83591502fdc08444aca84b733300a71)

All Branches