yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	7138f4036b	less synchronization, better thread dump tool git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7556 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	8d14916c74	more patches for a better out-of-memory management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	65bcc60808	stupid me: revert placement of closing connection which caused unclosed connections + reuse sockets git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7543 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	e3d75d6cd5	Not storing external header in an Header-Array and reduce a loop for its conversion. Ensure connection close if a OOM is thrown. Ensure setting resolved host is set at the request. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7542 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	42d90664f3	- fixed a memory leak in the httpc.post method (no finish) - patched some more memory-saving relevant code - some more minor bug fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7541 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	38dce547c0	better concurrency (less locking on date formatting) more logging and minor bug fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7540 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	89d337841c	more logging for OOMs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7534 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5e186e0122	continuing the fight against deadlocks during time formatting: better caching. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	dec24244cf	added convenience class to generate UTF StringBody objects with a default UTF8 charset. Reason: if this is not used in StringBody-Class initialization, a default charset name is parsed. This is a synchronized process and all classes using default charsets synchronize at that point Synchronization is omitted if this class is used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7530 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	19b2a50578	- enhanced date formatter cache - added more instances of formatter objects to different classes to make them independent in case of lockings that may applay during synchronization of the date formatter object (date formatting is not thread-safe and must be synchronized therefore) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7528 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a92d80a545	performance enhancements using an alternative to a insensitive collator (a complex string compare): - less synchronizations - better speed ..at most important and commonly used classes: http headers, url parsing and html parsing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7526 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	bcea497644	next try to fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3193&start=0&sid=b98aa9a7466397602b436eb45f4a9d39 tested proxy, crawl, updatedownload - please do further testing! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7524 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f95e50ec3d	more explanation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7522 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	bb36bf841a	emergency commit (sorry sixcooler for not waiting) because without that automatic updating peers would not be able to do the next update. Please see http://forum.yacy-websuche.de/viewtopic.php?p=22059#p22059 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7521 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	8ad4e10491	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3193&start=0&sid=b98aa9a7466397602b436eb45f4a9d39 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7520 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e3ef4e3021	- increased default peer ping time from 2 minutes to 1 minute - filtering out too old peers when reading seed lists (limit is now 240 minutes) - added concurrent host names resolving in front of the http client because the http client uses the java built-in DNS resolve which is not multithreading-safe (i have seen deadlocks in thread dumps showing that this bug in jdk is still there) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7515 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	cd19d0517e	added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7513 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	91eeaf2cff	fix in ftp client git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7505 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	d84b4a072e	healing for some OOM problems git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7502 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4aa406fb0f	added log output to find bug in url parser for short hosts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7501 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5892fff51f	introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased. Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods. The number of maximum peers is now not fixed to a specific number but may increase with - the partition exponent - the number of redundant peers - the robinson burst percentage - the multiword burst percentage The maximum can then be the number of senior peers (all visible peers). git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	64f32e8f00	) replaced all IPs in IP filters for proxy with the proper regular expression ) some cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7477 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	3e8b72be50	update to httpclient-4.1 - sorry forgot some git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7474 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5905f912c5	replaced more double types with float git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7462 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0cdfb82963	replaced more appearance of double values by float values git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	a321c7673d	* adminAccountForLocalhost only for localhost * yacy crawls local domains also, if no password is set (the interface is already protected) * it's not required anymore, to set a password in intranet mode git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
hermens	930cb412dd	Let SHORT_MILSEC_FORMATTER make a new formatted String every millisecond see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3103 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7434 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	48463c4507	) General private License? ;-) ) minor code changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7432 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	6c1b14c8e1	- more control in access tracker: count number of returned search results (not only info how much is in the index) - extended query params for this - enhanced cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7430 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	54e77e6255	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7426 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	24e4126eee	added JSON parser code from json.org (added generics to it) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7421 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10ae8d961b	- cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring) - cleaned up (removed special code and documentation for 27c3) - added remote search functions to be used within cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	e4aabaa1c3	* fix negative filelength for files >2G git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7408 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b2ed4cfaf8	more small bugfixes and light refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7401 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	903c824c2c	- allow only scanned resourced with granted status - increased time-out when scanning an ip range git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7398 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	fe46536f6e	enhanced network scanner (less name resolving during scanning and no name resolving during search) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7392 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e88c428008	fix to ftp loader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7387 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	59b70a5a92	another fix to the ftp crawler: now correct directory listings according to rfc2640 (path with spaces) and better title names for such files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7386 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9b25a33fd9	- fixed numerous bugs - better document names - fixed problem with ftp crawling - added automatic removal of search results from services that are not online according to the latest network scan: this does not delete the index but just does not show them. after the next network scan when the server is available again, the results are again showed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7385 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	7bdb13bf7f	more fixes to smb crawling: better file names git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7384 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	94c48500cc	several fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7383 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0ac7311a62	fix for token parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7382 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	58b59f9bc8	- a collection of bug fixes and some redesign of the Scanner class - fixed smb crawling - added smbget to download script generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7381 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	56264dcc17	- added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls - integrated new parser into loader processes: enrich document parser - fixed a concurrent modification exception in kelondro iterator - hand-over of document size from crawler to indexer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7374 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	99a7fe87f9	- removed old intranet scanner (the generic scanner now completely subsumes the old one) - added information about granted access - enhanced servlet design - added submit-feedback (because it is a long-running task) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7372 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	acab6801d9	added new network scanner - you can scan any ip or host in the internet for services - this replaces the intranet scanner git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7371 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	14e4fae8e9	fixes to ftp client git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7369 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a563b05b60	enhanced crawler: - added a new queue 'noload' which can be filled with urls where it is already known that the content cannot be loaded. This may be because there is no parser available or the file is too big - the noload queue is emptied with the parser process which indexes the file names only - the 'start from file' functionality now also reads from ftp crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7368 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	c36da90261	added a very fast ftp file list generator to site crawler: - when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once - the harvester runs concurrently and feeds into the normal crawl queue also in this: - fixed the 'start from file' crawl function - added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags. - this causes that a crawl start is now also possible from clear text link files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4e2c14efbb	fixed bugs in parser and ftp client git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7360 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago

1 2 3

145 Commits (9d366ee9d7289ec98483f0082aed94cb79fe364f)