yacy_search_server

Commit Graph

Author	SHA1	Message	Date
borg-0300	76d959122b	new constants, finals, Stringbuffer, cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2748 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6396f5971e	bugfixes and migration attempt toward new kelondroFlex db - more synchronization - bugfix for remove in collections - bugfix in kelondroFlex (wrong exception condition!) - options to use RAM, FLEX and TREE tables for Crawl URL stacker - default for Crawl URL stacker is now FLEX (!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	48f81acc0e	reverse SVN 2744, it is not needed (this resulted from a small misunderstanding of the newest cache layout) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	1da9aece12	Repair DNS prefetch during cacheScan git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	918b59dc5e	- bugfix for snippet profile (no delete button) - bugfix for search process (avoided null pointer exception in case other peer does not respond) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2742 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2bb529cedb	added peer tags for peers in robinson mode git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2741 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	afbb547f3d	extended options for abstracts generation in remote search interface git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2739 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	22649408ad	*) Better errorhandling for charset encoding problem during content parsing See: http://www.yacy-forum.de/viewtopic.php?t=2952 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a9c7e3f061	*) Bugfix for NoSuchElementException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f25f61d9d3	documentation of compile problem. See http://www.yacy-forum.de/viewtopic.php?p=26407#26407 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2734 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c8f3a7d363	added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	2cfd4633ac	*) even better handling of searchwords in snippets, words can consist of letters and numbers now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b062847797	fix for http://www.yacy-forum.de/viewtopic.php?p=26439#26439 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2731 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e17fea7015	files in htcache are now stored in different hash/tree subdirectories according to storage method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	661f005214	fix for seed upload build script git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2729 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	2d3b7251a4	*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ddf8f220f6	fix for build fail git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2727 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	25ae3d3161	generalized definition of hexhash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	86047f439d	removed very bad bug that prevented production of any remote search result :-((( Please update! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2724 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f0d747c723	removed deprecated method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	5ff77612ac	bugfix for old WORDS storage method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	0f10bdde22	more generic cache methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	72482b1426	fixed scraper git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2720 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	6557112d8f	small fix for plasmaURLPool.getURL() needed for new alternative htcache layout git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	440c6ee657	Implement alternative htcache layout mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	226f2c5b2c	first version, of the Serverlet Debugger git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	adf1f74ab2	bugfix for java 1.5 compile problem with serverCharBuffer.append(char) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2716 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	fd61209797	lines inside tags without punctuation are extended by a single dot. This enables the condenser to distinguish the lines in a better way. The result is a better preparation of snippets. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	1d0c0edda3	first version of posts/get from the del.icio.us api git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1969522dc1	removed lowercase of snippets (and other things): - added new sentence parser to condenser - sentence parsing can now handle charsets to do: charsets must be handed over to new sentence parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	43614f1b36	bugfix in collection index. the index for collections was not created correctly The bugfix includes a migration function which starts automatically after startup of yacy. This applies only to you, if you are using the new collection index. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1dfab1abe3	more control for seed receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2709 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1c0e65f55f	*) Bugfix for problems with charset detection See: http://www.yacy-forum.de/viewtopic.php?p=26196 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2708 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	db294687ea	enhanced logging - more logging output - fix in log line preparation - added filter to log page - some small bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a9a0f51303	*) suppressing InterruptedException errormessage See: http://www.yacy-forum.de/viewtopic.php?t=2915 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ce7ee74316	*) better errorhandling in filehandler (try catch block now starts before argument parsing) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1d4fb680ce	*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB TODO: make this limit configurable git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1586d57187	*) odtParser: better handling of large files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f17ce28b6d	) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file ) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested ) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) ) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO ) plasmaCondenser: adding new function getWords that can directly operate on input streams ) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	630a955674	read snippets from cache in case they are not provided in RAM git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bcf2b800b4	applied UTF-8 encoding parameter to yacy-internal protocol communication git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c40fca08a2	fixed bad handling of string separation you can now use a new encoding attribute to create strings from byte arrays git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	5a40ea7866	refactoring of wget string list generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	dbc2e039bb	added time-out option parameter to call hierarchy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	d4c239e4be	- fixed problem in collection index with deletion of single url references - added automatic deletion of not-found snippets after search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	00746ca232	identified and fixed search performance problem caused by snippet loading. Some access to header-db had been twice and even more times in some cases. Snippet resource loading fixed. Furthermore the snippet loading during remote search within the remote peer has been disabled, but can be switched on remotely by new flag 'includesnippet=true' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b033a80750	better control of failure in node seek of kelondroTree git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2686 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	310f1c41cd	added option to see ranking scores in surftipps and some cleanups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a2e3095044	*) Bugfix. Add missing plasmaParserDocument.close() calls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	cd5f349666	) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory ) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array ) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions ) httpd.java: better error handling if the soaphander is not installed ) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents ) rtfParser.java: Bugfix for UTF-8 support ) tarParser.java: better handling of large documents ) zipParser.java: better handling of large documents ) plasmaCrawlEURL.java: new errorcode for encrypted documents ) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2 3 4 5 ...

1739 Commits (76d959122bdcedfe92539337b1d69b1d98f12d83)