yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	86394e7a56	fix for cache-delete problem: - better synchronization - files are only deleted if they have been in the cache for 5 minutes - hash-path for the HTCACHE is now default git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3018 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ceb9e3aa17	- enhanced parser: collection of audio, video, image and application links - enhanced condenser: better handling of utf-8 and pre-formatted texts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3017 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	0b9370a9dc	fix for http://www.yacy-forum.de/viewtopic.php?p=28108#28108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3013 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b5a29e9651	- fix for snippets that are too short - added keyword to snippet fetch to suppres removal of not-found snippet words (for debugging) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3009 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f1528672b1	filtering of non-index pages during index-of search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3004 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8e7215475b	- extended ViewFile to use is as debugging-tool: you can now use the post-parameter url to submit an url directly - fixed some bugs in text parser (not all parts had been analysed) - fixed a bug in remote search interface (could not handle constraints) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3001 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	30888e7a2f	implementation of search constraints Such constraints may formulate specific restrictions to web searches This is implemented by scraping information for constraints from a web page during parsing, and storing flags to the pages within the web index. In this first step, only information for index pages ("index of", directory listings) are scraped and stored in flags - added new flag class kelondroBitfield - added scraper method in condenser - added bitfield structure for all scrape types (see also condenser) - added bitfield structure for appearance locations (see RWIEntry) - added handover protocol for remote search and index distribution - extended kelondroColumn class to hold bitfield types - added another search attribute on search page (index.html) - extended search-filter to enable filtering of non-matching constraints - set all new database types to be default - refactoring: moved word hash generation to condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	49a83f99d9	- fix for wrong DHT ordering in DHT selection - fix for http://www.yacy-forum.de/viewtopic.php?t=3112&highlight= git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2995 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f4b547dc13	limited index transfer to peer with version 0.486 this protects peers with version below 0.486 from new RWI objects (which they cannot handle) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2988 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	10a4ab5195	disabled some (more) write caches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2987 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	09bcc10344	bugfix for some problems of last change with assortments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2986 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e3d75f42bd	final version of collection entry type definition - the test phase of the new collection data structure is finished - test data that had been generated is void. There will be no migration - the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION - the index dump is void. There will be no migration - the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c9364246cc	introduced new RWI-Object. This will be used for the final version of the collections. The new object is not yet used. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2966 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e628d34e16	patches for bad data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2951 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	497428c8ec	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	76fceb9997	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	eeda881553	bugfix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2938 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb7d4b5d5e	refactoring to prepare new RWI entry object - moved all url and index(RWI) entries to index package - better naming to distinguish RWI entries and URL entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bdc9216366	- more asserts - some bugfixes - some patches for bugs that are already in the database git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2935 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1751a799ac	- deactivated all write buffers - fixed a storage bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ba967c4875	- bugfixes and debug code - ne generalized index class indexCachedRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2930 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ee4715a21c	- more asserts - bugfix for performaceMemory - refactoring of index ram cache: renamed indexRAMCacheRI to indexRAMRI, to make space for a cached indexRI, which should be named indexRAMCacheRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2925 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	114a76a86e	- added flag to urlhash that shows that domain is a local domain - enhanced local domain detection - bugfixing for memory assignment in kelondroFlexSplit - automatic memory assignment to caches according to available RAM - bugfixes for details during search process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b2d51be33c	bugfix for latest changes to entry generalization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2922 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	8385557672	Small fix for the Cache Monitor when using proxyCacheLayout=hash see: http://www.yacy-forum.de/viewtopic.php?p=27394#27394 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2916 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f1ed55a5fc	bugfix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2913 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8fdefd5c68	generalization of payload definition of index storage this is one step forward to the migration to a new collection data format git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ad248d61ca	*) more verbose exception git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2901 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	7e8669b15c	*) added possibility to "recycle" a DHTChunk that failed to transfer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2898 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	4feaa91890	*) Added additional MIME-Type. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2895 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	89af433879	*) Deleted parts of WebCat that were not needed for parsing SWFs. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2893 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	46a712e195	- more asserts - simplified indexURLEntry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2891 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	8c9bc7e341	*) extracting urls works now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2890 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	493391e42d	*) new flash parser, still experimental git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2888 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	215c4e65f1	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bd4f43cd66	- fixed a null pointer exception bug - switched off more write caches - re-enabled index-abstracts search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
auron_x	194d42b6a7	*) changed PPM-calculation to be more accurate git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2884 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	fe8afaf426	switched off usage of write cache for imprortant databases git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2883 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d3431433b0	more anonymization in logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2876 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e6044e5198	bugfix for http://www.yacy-forum.de/viewtopic.php?p=27207#27207 and http://www.yacy-forum.de/viewtopic.php?p=27219#27219 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2875 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	78b7f6f7fd	bugfix for index remove bug, appeared after search where snippet-loading triggered word removal git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	147d88cf23	re-design of database caching this should reduce IO a lot, because write caches are now actived for all databases - added new caching class that combines a read- and write-cache. - removed old read and write cache classes - removed superfluous RAM index (can be replaced by kelonodroRowSet) - addoped all current classes that used the old caching methods - more asserts, more bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4e363108e1	- removed bad debug code that caused a large and unnecessary delay during global search - fixed problem that global search results disappear after a search - removed some stopwords git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2861 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2a9d868f6d	- removed object cache from kelondroTree - generalized object caching and added new object caching class - added object caching wherever kelondroTree was used - added object caching also to usage of kelondroFlex - added object buffering (a write cache) to NURLs - added many assert statements; fixed bugs here and there - added missing close methods to latest added classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3ffc5b8793	fixed problem with serverCharBuffer.append(char) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2821 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	06854988da	- full integration of new LURL database in INDEX - added migration method for urlHash.db into INDEX git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
octoate	e4a3574b77	StringBuffer now resets every time the parser is called git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2817 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
karlchenofhell	ce237aefad	- assortment-sizes table from PerformanceQueues_p.html is not shown if not used - escape query- and fragment-part of an url as well - new resolveBackpath for urls: http://www.yacy-forum.de/viewtopic.php?t=2679#24867 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2815 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a5b9b514c1	*) retry crawling without content-encoding if the content-encoding header was not correct See: http://www.yacy-forum.de/viewtopic.php?p=26917#26917 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2811 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	92f774edd1	) Better charset encoding detection ) New testclass for charset encoding detection tests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b79e06615d	- added new LURL.Entry class for next database migration - refactoring of affected classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
octoate	cc24dde5e0	First version of a MS Excel parser based on Apache POI (event based parsing) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2801 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
karlchenofhell	4c63129136	- stupid mistake... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2798 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
karlchenofhell	ebf0da2a45	- now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	3d152bfe43	*) Logging message added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2794 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
karlchenofhell	b5e40e2fa2	- fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	77a59a115d	refactoring of indexing methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	cbb1e710b9	*) removing old class - was replaced by plasma/urlPattern/defaultURLPattern git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2765 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c6d46f7ebd	null pointer bugfix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2761 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	decb09df6d	*) Trying to be more tolerant against wrong charset names git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2760 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	e9afe39cbb	*) Trying to be more tolerant against wrong charset names See: http://www.yacy-forum.de/viewtopic.php?p=26662 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2759 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7526c831a8	*) Suppressing stracktrace git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2758 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	50f2578c55	- some bugfixing and code cleanup - now assortments can completely left out if they do not exist before startup and collection index is selected. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bdf4c7c51e	added missing files for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a5dd0d41af	- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
octoate	1c4076da8a	First version of the MS Powerpoint parser based on Apache POI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2753 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5b75d64d7d	*) bugfix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2750 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	71ed104bc7	*) adding additional rpm mimetype (used by packman) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2749 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6396f5971e	bugfixes and migration attempt toward new kelondroFlex db - more synchronization - bugfix for remove in collections - bugfix in kelondroFlex (wrong exception condition!) - options to use RAM, FLEX and TREE tables for Crawl URL stacker - default for Crawl URL stacker is now FLEX (!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	48f81acc0e	reverse SVN 2744, it is not needed (this resulted from a small misunderstanding of the newest cache layout) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	1da9aece12	Repair DNS prefetch during cacheScan git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	22649408ad	*) Better errorhandling for charset encoding problem during content parsing See: http://www.yacy-forum.de/viewtopic.php?t=2952 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a9c7e3f061	*) Bugfix for NoSuchElementException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c8f3a7d363	added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	2cfd4633ac	*) even better handling of searchwords in snippets, words can consist of letters and numbers now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e17fea7015	files in htcache are now stored in different hash/tree subdirectories according to storage method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	2d3b7251a4	*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	25ae3d3161	generalized definition of hexhash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f0d747c723	removed deprecated method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5ff77612ac	bugfix for old WORDS storage method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0f10bdde22	more generic cache methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	6557112d8f	small fix for plasmaURLPool.getURL() needed for new alternative htcache layout git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	440c6ee657	Implement alternative htcache layout mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	fd61209797	lines inside tags without punctuation are extended by a single dot. This enables the condenser to distinguish the lines in a better way. The result is a better preparation of snippets. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1969522dc1	removed lowercase of snippets (and other things): - added new sentence parser to condenser - sentence parsing can now handle charsets to do: charsets must be handed over to new sentence parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	43614f1b36	bugfix in collection index. the index for collections was not created correctly The bugfix includes a migration function which starts automatically after startup of yacy. This applies only to you, if you are using the new collection index. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	db294687ea	enhanced logging - more logging output - fix in log line preparation - added filter to log page - some small bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a9a0f51303	*) suppressing InterruptedException errormessage See: http://www.yacy-forum.de/viewtopic.php?t=2915 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1d4fb680ce	*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB TODO: make this limit configurable git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1586d57187	*) odtParser: better handling of large files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	f17ce28b6d	) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file ) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested ) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) ) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO ) plasmaCondenser: adding new function getWords that can directly operate on input streams ) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	630a955674	read snippets from cache in case they are not provided in RAM git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	dbc2e039bb	added time-out option parameter to call hierarchy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	00746ca232	identified and fixed search performance problem caused by snippet loading. Some access to header-db had been twice and even more times in some cases. Snippet resource loading fixed. Furthermore the snippet loading during remote search within the remote peer has been disabled, but can be switched on remotely by new flag 'includesnippet=true' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	310f1c41cd	added option to see ranking scores in surftipps and some cleanups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2e3095044	*) Bugfix. Add missing plasmaParserDocument.close() calls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	cd5f349666	) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory ) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array ) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions ) httpd.java: better error handling if the soaphander is not installed ) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents ) rtfParser.java: Bugfix for UTF-8 support ) tarParser.java: better handling of large documents ) zipParser.java: better handling of large documents ) plasmaCrawlEURL.java: new errorcode for encrypted documents ) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	f8ac694e51	*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b73efd5565	*) missing changes needed because of last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2 3 4 5 ...

1024 Commits (37e53b4a6ad9c5a20fc1a0c8a2fb105f89bfa3b1)