yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	daf0f74361	joined anomic.net.URL, plasmaURL and url hash computation: search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	40b0547611	- documentaton changes (removed old forum links) - different handling of link quotation - different handling of link normalization - enhanced html/unicode en/de-coding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	22ee85ca02	- specified exceptions thrown by ResourceInfoFactory and plasmaHTCache.loadResourceInfo() - caught possible NPE in CacheAdmin_p and added more error-cases - speeded up deletion of entries in the local crawl queue by crawl profile (it has been noted often that this deletion is slow) - added a bit javadoc git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3868 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	601fc7d1c5	- added source to J7Zip-modifed.jar and it's license (changelog is still to come) - moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools - prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	97d4ab2053	- handle null from iterator in IndexCreateWWWLocalQueue_p.java - fixed ETA to reach next peer in Network.java - added some <label>s and fxied minor XHTML errors in ConfigNetwork.html - try to avoid returning null in servlets as it is unexpected and causes a NPE in the file handler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3623 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	861f41e67e	redesigned NURL-handling: - the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks - the new NURL-index is managed by the crawl balancer - the crawl balancer does not need an internal index any more, it is replaced by the NURL-index - the NURL.Entry was generalized and is now a new class plasmaCrawlEntry - the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future - the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names) - the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information - the EURL index is now filled with ZURL objects - a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers - redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another - found and fixed numerous bugs in the context of crawl state handling - fixed a serious bug in kelondroCache which caused that entries could not be removed - fixed some bugs in online interface and adopted monitor output to new entry objects - adopted yacy protocol to handle new delegatedURL entries all old crawl queues will disappear after this update! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f7803a6ce4	enhanced crawl balancer - new domains now get a chance to get crawled early - less IO operations - new balancing method - better dump order at shutdown time - bugfixes regarding not found url hashes (no more superfluous cache kill) - domain access time is now shared over all balancer stacks - viewing the stack does no more disturbish the balancing algorithm that much - intelligent selection of best next domain using domain access times - extra double-check (to double-check the double-check) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3384 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	109ed0a0bb	- cleaned up code; removed methods to write the old data structures - added an assortment importer. the old database structures can be imported with java -classpath classes yacy -migrateassortments - modified wordmigration. The indexes from WORDS are now imported to the collection database. The call is java -classpath classes yacy -migratewords (as it was) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	857a2d76a2	*) better handling of server shutdown See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2471 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	faca799f79	*) Bugfix last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1362 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b1b8ba719e	*) adding links to specify the amount of entries of a queue that should be displayed on the gui git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1360 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bb79fb5d91	- changed handling of error cases retrieving urls from database (no more NULL values are returned, instead, an IOException is thrown) - removed ugly damagedURLS implementation from plasmaCrawlLURL.java (this inserted a static value into the Object which is not really a good style) - re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message to do: - the urldbcleanup feature must be re-tested git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	37f88b4017	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	64478b1f02	*) Adding possibility to delete crawler queue entries using regular expressions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1160 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	cb69047b91	*)cleanup access static methods and fields git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	4dbc871524	) Trying to get rid of possibility of exploits in IndexCreate through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181 ) I hope I got them all and did not overdo it. *) Just a tiny bit of cleanig up in News.java. (I messed it up myself some time ago.) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@749 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	416c126815	fix for a profile = null problem and new monitor in crawl queue git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@730 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	732a107160	*) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug - Logging message for "urlEntry.url() == null" is now displayed as info - IndexCreateWWWLocalQueue_p.html now detects null entries while looping throug the list and removes them automatically See: - http://www.yacy-forum.de/viewtopic.php?t=532#8781 - http://www.yacy-forum.de/viewtopic.php?t=639 - http://www.yacy-forum.de/viewtopic.php?t=1071 - http://www.yacy-forum.de/viewtopic.php?t=338 - http://www.yacy-forum.de/viewtopic.php?t=980 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@640 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	bead8a32aa	) IndexCreate_p.java: Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url ) kelondroStack.java, plasmaSwitchboardQueue.java Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java instead of an iterator to display the indexing-list. Advantages: avoid concurrent modifications of the list while displaying it. Speedup because now we have to access only one sync function instead of multiple ones (one for each entry) ) IndexCreateIndexingQueue_p.java Using new list() function of plasmaSwitchboardQueue ) httpdFileHandler.java If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is used insead of a post request, but a refresh should not be allowed. *) IndexCreateWWWLocalQueue_p.html Now it's possible to delete single entries of the local crawler queue git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	81e564edb8	faster crawl profile list cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@442 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	ad90f0ad13	activated RWI distribution to DHT for senior peers (default redundancy 3), necessary now for network growth git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@438 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	19dbed7cc8	code clean-up git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@401 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	252c6e4869	added crawl queue monitor for global crawls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@372 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

24 Commits (07d1e989095b614ba1a04c51e0241d685f8b2fbc)