yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	82bf9ac1c8	- added Collage servlet from datengrab and modified it: * all images are queued * private/public is respected * inserted into switchboard * added collageQueue class that stores all the queued images git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4683 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	959f448e5f	- disabled redirects in proxy (so client sees real path) - added connection stats (only connections currently in use) - remove "old" connections (closed or idle for some time) - synchronized shared parts of proxyHandler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4682 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	8fe39ebd74	-fixed file transmission with POST. The only usage was in ranking transmission, therefore: -fixed ranking transmission git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4681 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	202a3adb3e	refactoring of HttpClient Writer processes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4678 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	444dce7e81	more performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4676 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e356625b22	- refacotring of stream copy handling to support time-consuming operations - made usage of BufferedStreams explizit to distinct different copy method in serverFileUtils (byte-by-byte and using an own buffer) - introduced another timeout setting (java internal property) - more restrictions to clients accessing a single host (a security setting to prevent DoS by mistake) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4674 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	f01c50cf8d	Proxy logging error (first step to resolution!?) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4673 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	c3342e1178	- removed class with only one static method - removed connection method with too long time-out git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4672 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	f97971b63b	fixed NPE problems doing a shutdown from command-line git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4671 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2c1c3bb6eb	- some refactoring (sorry Daniel, hab in deinem Code rumgewütet) - fixed broken downloads (flush was missing) - different problem handling when download is corrupted - different default values in yacy.init git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4669 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	d96e2badc7	- fixed POST in proxy - prepared http connection tracking - refactoring (mainly moving StreamTools to serverFileUtils) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4668 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	94d3d3a86f	fixed Proxy (for GET, POST still does not work!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4665 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	5c3c1fdf41	replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7f9f639d20	- refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering - refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling - removed unused code parts from condenser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	4a80902081	- added ViewProfile as rdf in foaf syntax - added link to rdf and vCard version on html page - can be seen on http://localhost:8080/ViewProfile.html?hash=localhash - more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4411 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
low012	ae6d07bdb8	) "Did you mean:" will only be displayed if the list of suggested URLs is not empty. ) Removed <hr /> to make the "404 Unknown Host" error pag look like the other 404 error pages. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4298 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	21b8d1b918	small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	af10f729df	fixed image search and favicon loading git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4225 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a31b9097a4	preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls After removal of robots.txt checking from stacker threads, the multi-threading of this process is void. Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since creation of these threads is not resource-consuming, for a detailed explanation see svn 4106 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	f717beecb1	- Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers. - some minor code cleanups (mostly unnecessary casts, null checks) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b19bb6e5b1	- reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours - removed and added some debugging lines git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4133 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	01e0669264	re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	11b4f80bde	- fixed non-closing client connections - added client connection tracker in connections servelet git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4108 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1488769e1f	cleanup of unmaintained and outdated performance methods: removed object pools in httpc. Object pooling is not recommended, if the creation of the object is not time-intensive. Object pools are only useful, if there is much computation necessary to create some basic data that is stored in the object pool and can be re-used. This does not apply to object pools in YaCy. Object pooling of client sessions would make sense if they would allow re-use of living connections to other yacy clients. But every connection is closed after usage of an object in the client pool, therefore the YaCy server client objects are not such that hold hardware/network-allocated entities. See: http://www.javaperformancetuning.com/news/qotm033.shtml http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling http://docs.sun.com/source/816-7159-10/pt_chap5.html http://www.microjava.com/articles/techtalk/recylcle2 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4106 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	daf0f74361	joined anomic.net.URL, plasmaURL and url hash computation: search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6d759ad0a7	- new bot address - removed unused skins git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4065 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b5346141b3	made the plasmaHTCache static (there is only one internet, so we need only one cache) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4045 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	61f93cbf14	some code-cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4040 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	24e25e1141	enhanced SSI server-side support: - SSIs may now refer to servlets, not only files - calling a servlet, the servlet/SSI engine is called recursively - SSIs now work also for non-chunked-encoding supporting clients This will support the new search page functionality, to show search results dynamically without using javascript. To test this method, a test page has been added http://localhost:8080/ssitest.html ..calls dynamicalls 3 servlets, which produce some delays during their execution please verify that you can see the result step-by-step on your browser To implement this feature, some refactoring had been taken place, mostly code had been made static and will execute faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4037 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	57a5b6fa71	some generalization of remote proxy configuration and setting handling in httpc git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4023 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9ca46a8c69	indexing of local (intranet) urls enabled To do this, one must create a separate YaCy network that has a local URL domain A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	40b0547611	- documentaton changes (removed old forum links) - different handling of link quotation - different handling of link normalization - enhanced html/unicode en/de-coding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	26f05d1fd0	avoid division by zero if search is done for no words this case is relevant if the bluewords (yacy.blue) are used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3698 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	91c2a042a7	*) bugfix for wrong proxy traffic accounting git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3484 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a1fb8358b2	lets make a well-formed http link so that other crawlers don't have a problem to follow this link :-) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3463 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4edb70f68b	added yacybot info-page from Roland git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3462 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c464157a6e	replaced some toString() see http://www.yacy-forum.de/viewtopic.php?p=31151#31151 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3345 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	47ab83a7c0	added flag for YaCyHop - proxy access for all paths that start with /yacy/ git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3304 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a7e11ada50	*) suppressing stacktrace for "server has closed connection" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2779 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c8f3a7d363	added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f17ce28b6d	) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file ) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested ) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) ) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO ) plasmaCondenser: adding new function getWords that can directly operate on input streams ) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5afb0cbce8	) setting default charset (for unkown documents) to iso-8859-1 ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	97d2a08ef1	*) restructuring needed to support parsing of documents using various charsets - serverFileUtils.java: -- adding methods to copy from stream to writer and readers to writers -- moving httpc writeX methods into serverFileUtils class - serverCharBuffer.java: removing inheritance from Writer class - replacing htmlFilterOutputStream by htmlFilterWriter class which handles content as char stream - htmlFilterContentTransformer.java: deactivating getText mode (still needs to be migrated to use char streams instead of byte streams) - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream - changes in Scraper and Transformer classes to operate on chars instead of bytes - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a5ed86105b	*) bugfix for handling of ResourceInfo object in proxy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	6578564c9a	*) Ignore more hop by hop http headers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	dae763d8e3	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ffbf416e76	*) direct access to requestheader of htCache.Entry removed to make it more http independent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	3870d615e3	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	393a7d10be	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2 3

145 Commits (70826bb5015481c450331902e545bde6eb3c523b)