yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	c8f3a7d363	added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f17ce28b6d	) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file ) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested ) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) ) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO ) plasmaCondenser: adding new function getWords that can directly operate on input streams ) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b6c7b91582	) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) ) better logging of parser failures ) simplified usage of plasmaparser through switchboard ) restructuring of crawler - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher) ) snippet-fetcher: more verbose error messages ) serverByteBuffer.java: adding new function append(String,encoding) *) serverFileUtils.java: adding functions to copy only a given number of bytes between streams git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	09b106eb04	*) next step of restructuring for new crawlers - adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher) to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...]) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	419f8fb398	fixed bugs/missing code regarding new crawl stack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@384 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
theli	c9c0a1f11c	) Trying to speedup local crawling - introduction of a threadpool for crawling - introduction of a job queue to avoid buzy waiting for a free crawler slot ) New classes added - queue for receiving of crawler jobs - semaphore class to do reader/writer synchronization (mutual exclusion) - message object to hold all needed data about a crawler job *) Trying to solve session-thread shutdown problem - session thread stopped variable is now set from outside before interrupting the session thread. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@39 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

7 Commits (74f09a05102b19bf91a81c2376a2890423f4af52)