yacy_search_server

Commit Graph

Author	SHA1	Message	Date
theli	a7e11ada50	*) suppressing stacktrace for "server has closed connection" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2779 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c8f3a7d363	added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f17ce28b6d	) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file ) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested ) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) ) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO ) plasmaCondenser: adding new function getWords that can directly operate on input streams ) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5afb0cbce8	) setting default charset (for unkown documents) to iso-8859-1 ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	97d2a08ef1	*) restructuring needed to support parsing of documents using various charsets - serverFileUtils.java: -- adding methods to copy from stream to writer and readers to writers -- moving httpc writeX methods into serverFileUtils class - serverCharBuffer.java: removing inheritance from Writer class - replacing htmlFilterOutputStream by htmlFilterWriter class which handles content as char stream - htmlFilterContentTransformer.java: deactivating getText mode (still needs to be migrated to use char streams instead of byte streams) - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream - changes in Scraper and Transformer classes to operate on chars instead of bytes - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a5ed86105b	*) bugfix for handling of ResourceInfo object in proxy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	6578564c9a	*) Ignore more hop by hop http headers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	dae763d8e3	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ffbf416e76	*) direct access to requestheader of htCache.Entry removed to make it more http independent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	3870d615e3	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	393a7d10be	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	eee44be602	*) adding an interface for customized blacklist classes - now it's possible to use a customized blacklist engine instead of the default one - this can be done by configuring the property BlackLists.class See: http://www.yacy-forum.de/viewtopic.php?t=2108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d2e8e76218	*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler See: http://www.yacy-forum.de/viewtopic.php?t=2541 http://www.yacy-forum.de/viewtopic.php?p=24516 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b594ee9a5a	*) Adding possibility to configure if the http proxy should send the X-forwarded-for header (requested by TeeSee) See: http://www.yacy-forum.de/viewtopic.php?t=2577 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2257 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	5625937d1c	Language improvements One very minor HTML fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2181 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	90d569d70f	refactoring of index management: url storage is part of index management; moved plasmaURL to indexURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	015d044c25	tried to fix some problems with latest changes to httpc very experimental! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2078 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	55c5b41bd0	modified kelondroDyn to work better with new object caches (removed own single object cache) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2077 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	76ea16a6cb	*) Removing Keep-Alive header (is also a hopByHop header) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2034 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	77f3237de3	adapted for isListed() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1942 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	759800f543	*) Bugfix for storeHTCache problem - content was not indexed if storeHTCache was off See: http://www.yacy-forum.de/viewtopic.php?p=18269 See: http://www.yacy-forum.de/viewtopic.php?t=1882 See: http://www.yacy-forum.de/viewtopic.php?t=241 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ce5274c194	yacybot user agent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1786 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	34341a868e	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1701 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	15ed57f9b7	Updated German language, by VT100, NN, rramthun git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1690 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8fcb25f9f9	*) Setting via header according to rfc - can be disabled via settings dialog git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1662 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	eeba8b055e	*) guessing, testing and suggesting alternative hostnames on "unknown host" error See: http://www.yacy-forum.de/viewtopic.php?t=1879 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1636 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	4e4bd4662d	redirectors fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1288 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	37f88b4017	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	44fa94ac52	) Modifications for dbImport functionality - dbImporter threads are now shutdown by the switchboard on server shutdown - adding possibility to pause a importer thread via GUI - Bugfix for abort function See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363 ) Modification of content parser configuration - now it's possible to configure which parsers should be enabled for the proxy, crawler, icap, etc. separately - ) htmlFilterContentScraper.java - adding regular expression to normalize URLs containing /../ and /./ parts ) httpc.java - adding functionality to unzip gzipped content - requested by roland: should be used later to allow gzipped seed lists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1d6a6d1f85	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7e670894d9	) Suppressing stackTraces in proxyError message for "connect timed out" errors See: http://www.yacy-forum.de/viewtopic.php?t=1504 ) Increasing default http client timeout git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1129 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	d8afe60e07	Bugfix for last Bugfix ;-). host/port were set to originaladdress instead of the correct values for the new Url. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1126 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1b656f6b31	correction of bug from svn 1123 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1125 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	24d15eb0e8	moving the redirector code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1123 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	787c368696	synchronized redirectors and using the port. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1122 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	4776f3f815	squid like redirctors git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1120 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0e25020f51	added first generation and usage of YBR index-files. Enhanced overall ranking of search results. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0ec54d9c5f	enhanced CR-file handling and added first RCI-evaluation tests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	99fb26e499	*) Suppressing stackTraces in proxyError message for harmless errors See: http://www.yacy-forum.de/viewtopic.php?t=1504 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1108 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	295aff52a3	)added offline-browsing-support (onlineMode=0) )online-mode now can be changed in Status.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1010 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5a25ad9109	*) Bugfix for useRemoteProxy4YACY and useRemoteProxy4SSL check git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@969 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	02d9af1a70	) Restructuring and extending of Remote Proxy Support - remote proxy configuration can now be "really" changed on the fly and takes effect immediately - adding possibility to disable remote proxy usage for yacy->yacy communication - adding possibility to disable remote proxy usage for ssl - restructuring proxy configuration so that it is stored in a single place now ) Adding possibility to import a foreign word DB (or even more of them in parallel) at runtime into the peers DB - this can be done by calling IndexImport_p.html - ATTENTION: please not that at the moment this thread must be aborted via gui before a normal server shutdown is done. - TODO: integrating IndexImport Thread into normal server shutdown - TODO: Adding posibility to import crawl-queues, etc. from foreign peers - TODO: removing old import function from yacy.java and calling the new routines instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	54a97a7355	*) IfesL: Suppressing "Broken pipe" stacktrace in log file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@903 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	28c5687ff9	*) Bugfix for "download of non supported file content" via crawler See: http://www.yacy-forum.de/viewtopic.php?p=10724#10724 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@835 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	9e1485c13b	new Class for UserAccounts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@813 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b990dc1ad1	) Replacing jsch 0.1.19 lib with newer version 0.1.21 ) Replacing PDFBox 0.7.1 lib with newer version 0.7.2 ) Refactoring of classes httpd/httpc/httpHeaders to make many methods for httpHeader/Requestline parsing reusable for new icap implementation ) adding chunked input stream support - needed by new icap implementation - needed by future httpc HTTP/1.1 support ) httpd.java - moving all connection property contants to class httpHeader - moving readHeader function to class httpHeader - moving parseQuery function to class httpHeader - moving handleTransparentProxy function to class httpHeader ) httpHeader.java - adding new fuction to parse the http response line - adding new function to converte http headers to a string that can be send to the client - adding a function that generates a proper url using all parsed connection properties ) ICAP Support - yacy now supports handling of icap response modification requests - this feature can be used by other icap enabled proxies to contact yacy as icap server, and to handover the downloaded content to yacy.logging for indexing - functionality was successfully tested with squid 2.5Stable 10 + icap patch - further icap services e.g. URL filtering based on yacy's blacklists are possible ) plasmaSwitchboard.java - htcache entries that are still needed for indexing are now properly registered as in use after system restart - extended logging: log message now shows parsing and indexing time for each sb. entry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7809b382bf	*) Bugfix for Blacklist support for https (only initial connect) See: http://www.yacy-forum.de/viewtopic.php?p=9419 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@684 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2 3

107 Commits (cf49d5b0a7550fc37c52ddf05f61d443cec4a94b)