yacy_search_server

Commit Graph

Author	SHA1	Message	Date
theli	cd5f349666	) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory ) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array ) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions ) httpd.java: better error handling if the soaphander is not installed ) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents ) rtfParser.java: Bugfix for UTF-8 support ) tarParser.java: better handling of large documents ) zipParser.java: better handling of large documents ) plasmaCrawlEURL.java: new errorcode for encrypted documents ) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	f8ac694e51	*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b73efd5565	*) missing changes needed because of last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2463e5624a	'quick' release 0.47 - documentation update - necessary bugfixes (missing css for new peers) - reduced effect of search result redundancy filter - removed some debug output, but not all git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	625c2ce6b1	*) bugfix for snippet fetching problem if content but not http header is available in cache See: http://www.yacy-forum.de/viewtopic.php?p=25748 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	813a8a8179	*) migration of mimeTypeParser to jmimemagic 0.1 - better mimetype detection for rss feeds - better mimetype detection for odt documents (less memory consuming) - two new detector classes implementing MagicDetector interface of jmimemagic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	3f5a4153a0	Make Peers more receptible to transferred indexes - Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit so that the inCache gets flushed when the limit is passed - Modify flushCacheSome to flush enough words to get below MaxWordCount immediately git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b6c7b91582	) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) ) better logging of parser failures ) simplified usage of plasmaparser through switchboard ) restructuring of crawler - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher) ) snippet-fetcher: more verbose error messages ) serverByteBuffer.java: adding new function append(String,encoding) *) serverFileUtils.java: adding functions to copy only a given number of bytes between streams git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1dc12d6659	*) Bugfix for shutdown problem caused by cacheScan thread See: http://www.yacy-forum.de/viewtopic.php?p=25729 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
borg-0300	42173462f5	rename cutUrlText to shortenURLString; other little things; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	26dfbb7499	*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	cf6acff2c2	*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the default InputStream Buffer size. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5c6251bced	*) some improvements for extended html document charset support - new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract the charset meta data. This is only enabled for the crawler at the moment. Integration into proxy needs more testing. - adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed about detected tags (used by the htmlFilterInputStream.java) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f453c14b5d	removed unreacheable catch blocks and unused imports git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ad7f600f25	*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	97d2a08ef1	*) restructuring needed to support parsing of documents using various charsets - serverFileUtils.java: -- adding methods to copy from stream to writer and readers to writers -- moving httpc writeX methods into serverFileUtils class - serverCharBuffer.java: removing inheritance from Writer class - replacing htmlFilterOutputStream by htmlFilterWriter class which handles content as char stream - htmlFilterContentTransformer.java: deactivating getText mode (still needs to be migrated to use char streams instead of byte streams) - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream - changes in Scraper and Transformer classes to operate on chars instead of bytes - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	3aac5b26da	- added automatic tag generation when a web page from the search results is added - added new image 'B' in front of search results for bookmark generation - added news generation when a public bookmark is added - the '+' in front of search results has new meaning: positive rating for that result - added news generation when a '+' is hit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f644a1c3a7	better evaluation of index abstracts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	2fd610b556	http://www.yacy-forum.de/viewtopic.php?p=25611#25611 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	06fa891152	*) htmlFilterContentScraper.java: using proper charset for document title git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	74c3e7cf29	) storing document charset into plasmaParserDocument object (is needed later by the condenser) ) htmlFilterContentScraper.java: using proper charset for document title *) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	c5d3020941	*) better errorhandling for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	d0a5a53789	*) changes needed for multi-language support - parsers may need to know the charset of the byte stream git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	26ab1fa885	fixed null pointer exception See http://www.yacy-forum.de/viewtopic.php?p=25598#25598 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b0e8ff6eda	*) some TODO makers for UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	41e27b85b7	fix for crawler condition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	9ecf7f0da2	*) some TODO makers for UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2578 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c89d8142bb	replaced old 'kCache' by a full-controlled cache there are now two full-controlled caches for incoming indexes: - dhtIn - dhtOut during indexing, all indexes that shall not be transported to remote peers because they belong to the own peer are stored to dhtIn. It is furthermore ensured that received indexes are not again transmitted to other peers directly. They may, however be transmitted later if the network grows. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6e2907135a	bugfixes for remote search server part git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	cf9884e22b	first attempt to implement a secondary search this is a set of search processes that shall enrich search results with specialized requests to realize a combination of search results from different peers. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b251076e64	avoid ConcurrentModificationException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	75b198bc02	- updated references to indexContainer - more bugfixes and debugging for indexAbstract processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b7e7808ea6	wordmigration now works also for new index database if the new database is switched on, no 'too big' messages appear, all the WORDS files can be completely migrated git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2553 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a0ddf2ec11	) AbstractCrawlWorker.java: delete already downloaded data on crawling error ) plasmaSwitchboard.java: log unexpected errors while parsing/indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4f9e42d5ed	more changes towards better join-search - fixed problems with index-abstract generation - added analysis output for index abstract receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a7281a9b4d	fix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	82a6054275	- fixed bug with new indexAbstract generation - added partly evaluation of indexAbstracts during remote searches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	fded1f4a5d	*) better handling of maximum file size limit in crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	74d1dea30b	changes towards better join-search - added generation of a compressed index within remote peers during global search - added selection of specific urls within remote peers during secondary global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ae4e8ce03e	- cut for 'probably last html-interface version': version number update - small enhancement to ranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2536 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	64bed59ee8	enhancements to ranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	63893003be	) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use. ) adding first version of maximum filesize check for the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	94d7ced900	fix for last ranking commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2529 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	03835c2ee8	enhanced search result computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ac3419b65f	better debugging for indexOutOfBoundException bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a8bc768206	enhancements to ranking evaluation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	33898ae7e9	*) ResourceInfoFactory.java: Bugfix for classNotFoundException See: http://www.yacy-forum.de/viewtopic.php?t=2797 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	406e170e25	*) more verbose error message git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b298474e22	*) Bugfix needed because of changed plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2 3 4 5 ...

878 Commits (cd5f349666ba2d9cc3b3d9cd5488b72333dabb76)