yacy_search_server

Commit Graph

Author	SHA1	Message	Date
theli	a2e3095044	*) Bugfix. Add missing plasmaParserDocument.close() calls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	cd5f349666	) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory ) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array ) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions ) httpd.java: better error handling if the soaphander is not installed ) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents ) rtfParser.java: Bugfix for UTF-8 support ) tarParser.java: better handling of large documents ) zipParser.java: better handling of large documents ) plasmaCrawlEURL.java: new errorcode for encrypted documents ) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b6c7b91582	) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) ) better logging of parser failures ) simplified usage of plasmaparser through switchboard ) restructuring of crawler - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher) ) snippet-fetcher: more verbose error messages ) serverByteBuffer.java: adding new function append(String,encoding) *) serverFileUtils.java: adding functions to copy only a given number of bytes between streams git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	dae763d8e3	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	2126c51906	*) bugfix for ViewFile.java. Wrong http header were used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2488 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	3870d615e3	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	015d044c25	tried to fix some problems with latest changes to httpc very experimental! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2078 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	97c6a70b71	*) Fixed XSS vulnerability. I was able to crawl a PDF that caused the loading of an image in the admin's browser. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2056 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	dc9174c809	*) Implementing snippet fetching via ajax Snippets that are not available on page load time will be fetched using ajax requests. see: http://www.yacy-forum.de/viewtopic.php?p=16479 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1748 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f4ffa9aee5	- implemented more attributes to index entries - implemented hand-over of new word index attributes during remote search - implemented word-distance computation during search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bb79fb5d91	- changed handling of error cases retrieving urls from database (no more NULL values are returned, instead, an IOException is thrown) - removed ugly damagedURLS implementation from plasmaCrawlLURL.java (this inserted a static value into the Object which is not really a good style) - re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message to do: - the urldbcleanup feature must be re-tested git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	47a2a8885d	*) Display MimeType on URL Info page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1077 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	dd24f0252f	*) Searchword highlighting for info page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1036 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
theli	40777556c5	) Connection Tracking - adding automatic refresh - accepts new parameter nameLookup which can be used to deactivate yacy-peer name lookup (because we have problems with this on large seed-dbs) ) ViewFile New page that can be used to view - original content - plain text content - parsed content - parsed sentences of a webpage specified by there url hash Mainly for debugging purpose at the moment ) Robots.txt Bugfix for if-modified-since usage TODO: synchronization of downloads to avoid loading the same robots-file multiple times in parallel by different threads ) Shutdown Better abortion of transferRWI and transferURL sessions on server shutdown *) Status Page Adding icon to start/stop crawling via status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

1 2 3 4

169 Commits (f13c0b2abd2be59fb48cc09e61440f966f924363)