yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	00746ca232	identified and fixed search performance problem caused by snippet loading. Some access to header-db had been twice and even more times in some cases. Snippet resource loading fixed. Furthermore the snippet loading during remote search within the remote peer has been disabled, but can be switched on remotely by new flag 'includesnippet=true' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	310f1c41cd	added option to see ranking scores in surftipps and some cleanups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2e3095044	*) Bugfix. Add missing plasmaParserDocument.close() calls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	cd5f349666	) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory ) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array ) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions ) httpd.java: better error handling if the soaphander is not installed ) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents ) rtfParser.java: Bugfix for UTF-8 support ) tarParser.java: better handling of large documents ) zipParser.java: better handling of large documents ) plasmaCrawlEURL.java: new errorcode for encrypted documents ) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	f8ac694e51	*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b73efd5565	*) missing changes needed because of last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2463e5624a	'quick' release 0.47 - documentation update - necessary bugfixes (missing css for new peers) - reduced effect of search result redundancy filter - removed some debug output, but not all git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	625c2ce6b1	*) bugfix for snippet fetching problem if content but not http header is available in cache See: http://www.yacy-forum.de/viewtopic.php?p=25748 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	813a8a8179	*) migration of mimeTypeParser to jmimemagic 0.1 - better mimetype detection for rss feeds - better mimetype detection for odt documents (less memory consuming) - two new detector classes implementing MagicDetector interface of jmimemagic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	3f5a4153a0	Make Peers more receptible to transferred indexes - Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit so that the inCache gets flushed when the limit is passed - Modify flushCacheSome to flush enough words to get below MaxWordCount immediately git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b6c7b91582	) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) ) better logging of parser failures ) simplified usage of plasmaparser through switchboard ) restructuring of crawler - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher) ) snippet-fetcher: more verbose error messages ) serverByteBuffer.java: adding new function append(String,encoding) *) serverFileUtils.java: adding functions to copy only a given number of bytes between streams git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1dc12d6659	*) Bugfix for shutdown problem caused by cacheScan thread See: http://www.yacy-forum.de/viewtopic.php?p=25729 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	42173462f5	rename cutUrlText to shortenURLString; other little things; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	26dfbb7499	*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	cf6acff2c2	*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the default InputStream Buffer size. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5c6251bced	*) some improvements for extended html document charset support - new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract the charset meta data. This is only enabled for the crawler at the moment. Integration into proxy needs more testing. - adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed about detected tags (used by the htmlFilterInputStream.java) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f453c14b5d	removed unreacheable catch blocks and unused imports git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ad7f600f25	*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	97d2a08ef1	*) restructuring needed to support parsing of documents using various charsets - serverFileUtils.java: -- adding methods to copy from stream to writer and readers to writers -- moving httpc writeX methods into serverFileUtils class - serverCharBuffer.java: removing inheritance from Writer class - replacing htmlFilterOutputStream by htmlFilterWriter class which handles content as char stream - htmlFilterContentTransformer.java: deactivating getText mode (still needs to be migrated to use char streams instead of byte streams) - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream - changes in Scraper and Transformer classes to operate on chars instead of bytes - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3aac5b26da	- added automatic tag generation when a web page from the search results is added - added new image 'B' in front of search results for bookmark generation - added news generation when a public bookmark is added - the '+' in front of search results has new meaning: positive rating for that result - added news generation when a '+' is hit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f644a1c3a7	better evaluation of index abstracts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	2fd610b556	http://www.yacy-forum.de/viewtopic.php?p=25611#25611 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	06fa891152	*) htmlFilterContentScraper.java: using proper charset for document title git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	74c3e7cf29	) storing document charset into plasmaParserDocument object (is needed later by the condenser) ) htmlFilterContentScraper.java: using proper charset for document title *) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c5d3020941	*) better errorhandling for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d0a5a53789	*) changes needed for multi-language support - parsers may need to know the charset of the byte stream git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	26ab1fa885	fixed null pointer exception See http://www.yacy-forum.de/viewtopic.php?p=25598#25598 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b0e8ff6eda	*) some TODO makers for UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	41e27b85b7	fix for crawler condition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9ecf7f0da2	*) some TODO makers for UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2578 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c89d8142bb	replaced old 'kCache' by a full-controlled cache there are now two full-controlled caches for incoming indexes: - dhtIn - dhtOut during indexing, all indexes that shall not be transported to remote peers because they belong to the own peer are stored to dhtIn. It is furthermore ensured that received indexes are not again transmitted to other peers directly. They may, however be transmitted later if the network grows. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6e2907135a	bugfixes for remote search server part git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cf9884e22b	first attempt to implement a secondary search this is a set of search processes that shall enrich search results with specialized requests to realize a combination of search results from different peers. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b251076e64	avoid ConcurrentModificationException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	75b198bc02	- updated references to indexContainer - more bugfixes and debugging for indexAbstract processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b7e7808ea6	wordmigration now works also for new index database if the new database is switched on, no 'too big' messages appear, all the WORDS files can be completely migrated git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2553 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a0ddf2ec11	) AbstractCrawlWorker.java: delete already downloaded data on crawling error ) plasmaSwitchboard.java: log unexpected errors while parsing/indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4f9e42d5ed	more changes towards better join-search - fixed problems with index-abstract generation - added analysis output for index abstract receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a7281a9b4d	fix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	82a6054275	- fixed bug with new indexAbstract generation - added partly evaluation of indexAbstracts during remote searches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fded1f4a5d	*) better handling of maximum file size limit in crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	74d1dea30b	changes towards better join-search - added generation of a compressed index within remote peers during global search - added selection of specific urls within remote peers during secondary global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ae4e8ce03e	- cut for 'probably last html-interface version': version number update - small enhancement to ranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2536 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	64bed59ee8	enhancements to ranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	63893003be	) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use. ) adding first version of maximum filesize check for the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	94d7ced900	fix for last ranking commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2529 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	03835c2ee8	enhanced search result computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ac3419b65f	better debugging for indexOutOfBoundException bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a8bc768206	enhancements to ranking evaluation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	33898ae7e9	*) ResourceInfoFactory.java: Bugfix for classNotFoundException See: http://www.yacy-forum.de/viewtopic.php?t=2797 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	406e170e25	*) more verbose error message git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b298474e22	*) Bugfix needed because of changed plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	96c6e4e322	- enhancements to detailed search page - enhancements to search ranking computation process - removed bugs in postranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a5ed86105b	*) bugfix for handling of ResourceInfo object in proxy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	ff4362b02d	some more fixes for new plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	7aeadbe7cc	another NullPointerException in http.ResourceInfo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	141f9e5bb4	fix for new plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	087f7511f8	prevent NullPointerException in http.ResourceInfo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a2525072f2	bugfix for kelondroRow - property generation this bug affected ranking parameters :-( git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b44514242a	*) crawler/ftp/CrawlWorker.java: better errorhandling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7d7f30139c	*) crawler/ftp/CrawlWorker.java: delete old cache file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4ae0f122f8	*) ResourceInfo.java: License header added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	043edfa4d8	) ftp/ResourceInfo.java ResourceInfo object for ftp resources added ) ftp/CrawlWorker.java better errorhandling for ftp crawler *) plasmaCrawlEURL.java: some errorcodes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8a0e35618b	enhancements to search result preparation - added detailed count on remote search results - enhanced search sequence during remote searches (doing local search in sequence) - strict adherence to timout limits git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5c1bb53d2a	Missing description for last commit *) next step of restructuring for new crawlers > HTCaching should now work protocol independent -- introduction of new ResourceInfo objects containing protocolspecific metadata of a resource. -- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX, shallStoreCacheForXXX in a protocol dependent manner > Indexing should also work protocol independent now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	dae763d8e3	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4825bfaaf3	*) Bugfix for PrintWriter Problem See: http://www.yacy-forum.de/viewtopic.php?t=2792 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7930839594	) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path ) CrawlWorker.java: using new dirhtml function of ftpc git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7a35b8e237	*) direct access to responseheaders of sbQueue.Entry removed to make it more http independent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ffbf416e76	*) direct access to requestheader of htCache.Entry removed to make it more http independent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	3870d615e3	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	393a7d10be	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ab5a9bee66	) adding some copyright headers ) next step of restructuring for new crawlers - adding first testversion of ftp crawler class -- does not create a htCache entry yet git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5847492537	*) next step of restructuring for new crawlers - IndexCreate_p.java: correcting problems with ftp urls - URL.java does not cutout the userinfo anymore (needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de) - plasmaCrawlLoader.java: -- hack to re enable https urls -- adding function getSupportedProtocols git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fce9e7741b	*) next step of restructuring for new crawlers - renaming of http specific crawler settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	e3f0136606	*) next step of restructuring for new crawlers - adding function isSupportedProcotol to plasmaCrawlLoader.java - disabling robots.txt check for protocols other than http(s) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9ded4e8d5a	*) Bugfix for name resolution in proxy mode See: http://www.yacy-forum.de/viewtopic.php?p=25241 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1c8300fcec	*) Bugfix for name resolution in proxy mode See: http://www.yacy-forum.de/viewtopic.php?p=25241 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4e2a950ac9	*) next step of restructuring for new crawlers - avoid using the http crawler class directly. Using the interface class instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2476 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	09b106eb04	*) next step of restructuring for new crawlers - adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher) to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...]) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	eb9b138986	*) next step of restructuring for new crawlers - conversion of the crawler pool into a keyed object pool - crawlers are now loaded based on the url protocol (of course works only for http now) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1395aae742	*) starting restructuring which is needed to add crawlers for additional protocols git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b4acbdaa97	*) better handling of server shutdown See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2470 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	f3ac4dbbb9	*) better handling of server shutdown See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	959b779aba	*) avoid performance loss if log level is greater than 'fine' See: http://www.yacy-forum.de/viewtopic.php?p=25180 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2467 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	18b6876860	new cache flush configuration settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	f0278b4092	Bugfix for / by zero when the AssortmentCluster is empty See: http://www.yacy-forum.de/viewtopic.php?t=2746 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2459 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	14e0bb0dcf	allow more references per word for new db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2458 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	985dcbde7f	changed some parameters that may cause better memory usage and more indexing speed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b7f4a1521b	added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c26da4893b	turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	db1eae0227	* simplified initialization of database objects * replaced kelondroTree for NURLs by kelondroFlex * replaced kelondroTree for EURLs by kelondroFlex take care, may be very buggy please finish crawls before updating. crawls will be lost. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	0b73f2b132	Repair DNS prefetch during cacheScan git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	27a159b401	* documentation update * removed doc from release * release information in doc/News.html * release 0.46 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2442 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	f80f776b89	*) Trying to solve NullpointerException problem in function addURLtoErrorDB See: http://www.yacy-forum.de/viewtopic.php?t=2705 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2441 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	1c99b5a484	)fixed logging for urldbcleanup )changed exception handling in urldbcleanup so that it shows NullPointerException correctly *)added more Blacklisting to urlcleaner git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2436 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f3f4ab0eb	enhanced synchronisation in plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2433 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	23dd972608	fixed memory calculation in performanceMemory web page fixed also maximum cache size computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1ce3c22761	better memory control: - added memory monitor for preNURL-db in performanceMemory - changed default memory assignments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	39b4c26bdc	more memory control: - catchup of OutOfMemoryError in server threads - automatic adoption of word cache size after a Short Mem Cycle git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2426 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3e9d509c39	some small fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2425 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	eb633c0a4f	server threads must now supply a method that can be called in case of short memory. This has been realized for the indexing thread. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2421 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f5720cb2fa	removed most synchronization in wordIndex (for testing) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2420 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0187c60010	because of a bug in the JRE 1.4.2 there was no memory protection see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4686462 this commit fixes the bug by using a memory-computation patch. All uses of Runtime.maxMemory had been replaced by serverMemory.max The bug is not present any more in Java 1.5 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2419 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cfb51fdef1	less synchronization in plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2416 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d6a928c2da	quickfix for http://www.yacy-forum.de/viewtopic.php?t=2705 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2415 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6ad471ef96	* applied many compiler warning recommendations * cleaned up code * added unit test code * migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	9da3aa74d3	silly me, fix for the fix as advised by theli git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2408 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	bb3d9a5582	*) e.getMessage().indexOf() can only be used if there is actually an ExceptionMessage. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2407 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	7a54010a9c	*) Iterators can't be casted to IndexContainer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2406 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cd5f7e137c	fixed problem with NURL-generation upon first startup (a new kelondroFlexTable was generated, which should not) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8418af141a	added several consistency checks and small changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2400 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9d13aeca13	*) removing class. does not work so far git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2399 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	95a84ae469	*) adding missing classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2398 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	eee44be602	*) adding an interface for customized blacklist classes - now it's possible to use a customized blacklist engine instead of the default one - this can be done by configuring the property BlackLists.class See: http://www.yacy-forum.de/viewtopic.php?t=2108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6d2f15971a	there is a very strange error that causes that the kelondroRecords structure is corrupted. The cause is, that the deleted-records-chain has wrong entries, and one of the pointers in that chain points to a place behind the file end. This causes an IndexOutOfBoundsException within an IO operation. I currently don't know the reason that the deleted-records-chain is corrupted, but the error can be catched. If this now happens with the assortment database, the database is deleted. See also: http://www.yacy-forum.de/viewtopic.php?p=24586#24586 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2396 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d2e8e76218	*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler See: http://www.yacy-forum.de/viewtopic.php?t=2541 http://www.yacy-forum.de/viewtopic.php?p=24516 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9ae9062bd3	* disabled new kelondroFlex table for NURLs * added new RAM index Class * fixed possible synchronization problem in kelondroRecords git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2388 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	689bbcf9cd	replaced kelondroTree db for NURLs by new kelondroFlexTable The new database is only created if the old is deleted or does not exist git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fbba41962	synchronization fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2386 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	328f9859a5	more synchronization in plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2385 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	130e6d4719	generalized index object for eurl, nurl and lurl to prepare move of these tables to new kelondroFlexTable Object git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	acdf24877f	more synchronization against outOfMemoryError in wordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2381 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	95160d7f2c	fixed size computation of index elements from the collection index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	26116cabde	added missing rowdef assignment git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2379 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	abf22f6e60	removed url normalform computation from htmlFilterContentScraper. This method was implemented in de.anomic.net.URL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	740d49751d	* strict type and size check in kelondroRow handling * adopted all code to use the declaration form of kelondroRow * fixed a bug in kelondroRow which caused wrong parsing of encoding type * the bug caused bad database behaviour in new indexCollection data structure. because of this bug, all test databases are now already void. A new database is created * the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition into a properties file along the database files. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	314021453f	* more logging * option in yacy.init to set useCollectionIndex usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	61b151b083	* added another auto-fix for collection index inconsitency check * fixed words size computation for collection index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2368 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f58283def2	better control of index flush git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4be21a3cab	ups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2363 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	80b6c90d54	enhancements to prevent blocking during dht transfer receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9f298083cd	*) adding more urls to the error url - old error strings where replaced with there corresponding constants See: http://www.yacy-forum.de/viewtopic.php?t=2638 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	d56f06401e	- Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible - Small logging updates git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c09f734d06	*) offer router configuration on ConfigBasic.html - checkbox to allow router configuration is shown if - a) the UPnP forwarder is installed - b) a UPnP enabled router was found - c) no other forwarder was configured See: http://www.yacy-forum.de/viewtopic.php?p=24264 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	dcbb4d0a6b	Display the size of HashBlacklistedCache on PerformanceMemory page. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2357 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d799622da1	better flush limit for index collections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	279b1d969d	Integrated new indexing data structure 'collections' into the main class for indexing, the plasmaWordIndex. The new data structure is ready-to-use, but currently disabled. It can be activated by setting the static plasmaWordIndex.useCollectionIndex to true. This shall be done for testing purpose. The new index is stored to DATA/INDEX/PUBLIC/TEXT The directory PLASMA shall be used only for crawler in the future. Attention: during testing the data structure in INDEX may change, and created indexes with the new data structure may get useless. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4ff742e42d	implemented indexCollectionRI this is the new database structure that is supposed to replace the plasmaAssortmentCluster AND the plasmaWordIndexFileCluster The new structure is not yet active and needs to be integrated into plasmaWordIndex. This has some migration constraints that are not yet completely solved. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	01f95eccd3	re-write of kelondroCollectionIndex. This is the data structure that shall replace the current assortment files. * used the kelondroFlexTable to hold the index of collections * used kelondroRow definitions to declare all data structures * fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ebc2233092	* implemented (finished) class indexRowSetContainer * replaced indexTreeMapContainer by indexRowSetContainer * deleted indexTreeMapContainer and abstract class This is another step to the new database structure git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9183d21f25	renamed new index class to old name git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c4e922885a	replaced indexURLEntry by new class that uses a kelondroRow.Entry object to store the index entry. This is another step to move to the new database structure. A side effect of this change is, that index storage uses much less RAM space, which affects the index RAM cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e357599f92	* fixed problem with indexContainer iteration from RAM: indexContainers from RAM must be cloned explicitely to prevent side-effects on stored indexContainer objects in Cache * changed behaviour of urlReference deletion from indexContainers: deletion does not user retrieval of all Elements from the assortments * added textual configuration of kelondroRow and kelondroColumn definition * update of kelondroRow usage in yacyNews * modified kelondroAttrSeq to use modified kelondroColumn parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8b77afd72c	some fixes to new container merger and some code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	417ed5102e	redesign of database iterators: an iteration of key elements in kelondroTree databases is no longer supported. this is now replaced by an iteration of kelondroRow.Entry objects from the database Iteration of keys from the database was mostly followed by retrieval of the row from the database, whcih caused unnecessary database load. The index selection was also redesigned to use the new row iteration methods. This affects many funktions, most important is the DHT selection routine which is now much faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ad692fc6c7	implemented option to extract nurls from the database (plus some iteration enhancements for nurls) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2 3 4 5 ...

981 Commits (109ed0a0bb23982897b58117ed65dc579c616f11)