yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	8e7215475b	- extended ViewFile to use is as debugging-tool: you can now use the post-parameter url to submit an url directly - fixed some bugs in text parser (not all parts had been analysed) - fixed a bug in remote search interface (could not handle constraints) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3001 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	30888e7a2f	implementation of search constraints Such constraints may formulate specific restrictions to web searches This is implemented by scraping information for constraints from a web page during parsing, and storing flags to the pages within the web index. In this first step, only information for index pages ("index of", directory listings) are scraped and stored in flags - added new flag class kelondroBitfield - added scraper method in condenser - added bitfield structure for all scrape types (see also condenser) - added bitfield structure for appearance locations (see RWIEntry) - added handover protocol for remote search and index distribution - extended kelondroColumn class to hold bitfield types - added another search attribute on search page (index.html) - extended search-filter to enable filtering of non-matching constraints - set all new database types to be default - refactoring: moved word hash generation to condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	49a83f99d9	- fix for wrong DHT ordering in DHT selection - fix for http://www.yacy-forum.de/viewtopic.php?t=3112&highlight= git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2995 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f4b547dc13	limited index transfer to peer with version 0.486 this protects peers with version below 0.486 from new RWI objects (which they cannot handle) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2988 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	10a4ab5195	disabled some (more) write caches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2987 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	09bcc10344	bugfix for some problems of last change with assortments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2986 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e3d75f42bd	final version of collection entry type definition - the test phase of the new collection data structure is finished - test data that had been generated is void. There will be no migration - the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION - the index dump is void. There will be no migration - the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c9364246cc	introduced new RWI-Object. This will be used for the final version of the collections. The new object is not yet used. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2966 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e628d34e16	patches for bad data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2951 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	497428c8ec	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	76fceb9997	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	eeda881553	bugfix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2938 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb7d4b5d5e	refactoring to prepare new RWI entry object - moved all url and index(RWI) entries to index package - better naming to distinguish RWI entries and URL entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bdc9216366	- more asserts - some bugfixes - some patches for bugs that are already in the database git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2935 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1751a799ac	- deactivated all write buffers - fixed a storage bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ba967c4875	- bugfixes and debug code - ne generalized index class indexCachedRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2930 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ee4715a21c	- more asserts - bugfix for performaceMemory - refactoring of index ram cache: renamed indexRAMCacheRI to indexRAMRI, to make space for a cached indexRI, which should be named indexRAMCacheRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2925 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	114a76a86e	- added flag to urlhash that shows that domain is a local domain - enhanced local domain detection - bugfixing for memory assignment in kelondroFlexSplit - automatic memory assignment to caches according to available RAM - bugfixes for details during search process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b2d51be33c	bugfix for latest changes to entry generalization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2922 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	8385557672	Small fix for the Cache Monitor when using proxyCacheLayout=hash see: http://www.yacy-forum.de/viewtopic.php?p=27394#27394 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2916 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f1ed55a5fc	bugfix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2913 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8fdefd5c68	generalization of payload definition of index storage this is one step forward to the migration to a new collection data format git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ad248d61ca	*) more verbose exception git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2901 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hydrox	7e8669b15c	*) added possibility to "recycle" a DHTChunk that failed to transfer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2898 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	4feaa91890	*) Added additional MIME-Type. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2895 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	89af433879	*) Deleted parts of WebCat that were not needed for parsing SWFs. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2893 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	46a712e195	- more asserts - simplified indexURLEntry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2891 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	8c9bc7e341	*) extracting urls works now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2890 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	493391e42d	*) new flash parser, still experimental git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2888 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	215c4e65f1	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bd4f43cd66	- fixed a null pointer exception bug - switched off more write caches - re-enabled index-abstracts search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
auron_x	194d42b6a7	*) changed PPM-calculation to be more accurate git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2884 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	fe8afaf426	switched off usage of write cache for imprortant databases git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2883 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	d3431433b0	more anonymization in logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2876 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e6044e5198	bugfix for http://www.yacy-forum.de/viewtopic.php?p=27207#27207 and http://www.yacy-forum.de/viewtopic.php?p=27219#27219 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2875 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	78b7f6f7fd	bugfix for index remove bug, appeared after search where snippet-loading triggered word removal git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	147d88cf23	re-design of database caching this should reduce IO a lot, because write caches are now actived for all databases - added new caching class that combines a read- and write-cache. - removed old read and write cache classes - removed superfluous RAM index (can be replaced by kelonodroRowSet) - addoped all current classes that used the old caching methods - more asserts, more bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4e363108e1	- removed bad debug code that caused a large and unnecessary delay during global search - fixed problem that global search results disappear after a search - removed some stopwords git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2861 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2a9d868f6d	- removed object cache from kelondroTree - generalized object caching and added new object caching class - added object caching wherever kelondroTree was used - added object caching also to usage of kelondroFlex - added object buffering (a write cache) to NURLs - added many assert statements; fixed bugs here and there - added missing close methods to latest added classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	3ffc5b8793	fixed problem with serverCharBuffer.append(char) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2821 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	06854988da	- full integration of new LURL database in INDEX - added migration method for urlHash.db into INDEX git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
octoate	e4a3574b77	StringBuffer now resets every time the parser is called git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2817 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	ce237aefad	- assortment-sizes table from PerformanceQueues_p.html is not shown if not used - escape query- and fragment-part of an url as well - new resolveBackpath for urls: http://www.yacy-forum.de/viewtopic.php?t=2679#24867 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2815 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a5b9b514c1	*) retry crawling without content-encoding if the content-encoding header was not correct See: http://www.yacy-forum.de/viewtopic.php?p=26917#26917 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2811 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	92f774edd1	) Better charset encoding detection ) New testclass for charset encoding detection tests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b79e06615d	- added new LURL.Entry class for next database migration - refactoring of affected classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
octoate	cc24dde5e0	First version of a MS Excel parser based on Apache POI (event based parsing) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2801 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	4c63129136	- stupid mistake... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2798 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	ebf0da2a45	- now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	3d152bfe43	*) Logging message added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2794 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	b5e40e2fa2	- fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	77a59a115d	refactoring of indexing methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	cbb1e710b9	*) removing old class - was replaced by plasma/urlPattern/defaultURLPattern git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2765 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c6d46f7ebd	null pointer bugfix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2761 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	decb09df6d	*) Trying to be more tolerant against wrong charset names git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2760 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	e9afe39cbb	*) Trying to be more tolerant against wrong charset names See: http://www.yacy-forum.de/viewtopic.php?p=26662 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2759 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	7526c831a8	*) Suppressing stracktrace git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2758 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	50f2578c55	- some bugfixing and code cleanup - now assortments can completely left out if they do not exist before startup and collection index is selected. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bdf4c7c51e	added missing files for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a5dd0d41af	- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
octoate	1c4076da8a	First version of the MS Powerpoint parser based on Apache POI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2753 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5b75d64d7d	*) bugfix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2750 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	71ed104bc7	*) adding additional rpm mimetype (used by packman) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2749 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6396f5971e	bugfixes and migration attempt toward new kelondroFlex db - more synchronization - bugfix for remove in collections - bugfix in kelondroFlex (wrong exception condition!) - options to use RAM, FLEX and TREE tables for Crawl URL stacker - default for Crawl URL stacker is now FLEX (!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	48f81acc0e	reverse SVN 2744, it is not needed (this resulted from a small misunderstanding of the newest cache layout) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	1da9aece12	Repair DNS prefetch during cacheScan git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	22649408ad	*) Better errorhandling for charset encoding problem during content parsing See: http://www.yacy-forum.de/viewtopic.php?t=2952 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a9c7e3f061	*) Bugfix for NoSuchElementException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c8f3a7d363	added snippet-url re-indexing - snippets will generate an entry in responseHeader.db - there is now another default profile for snippet loading - pages from snippet-loading will be indexed, indexing depth = 0 - better organization of default profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	2cfd4633ac	*) even better handling of searchwords in snippets, words can consist of letters and numbers now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e17fea7015	files in htcache are now stored in different hash/tree subdirectories according to storage method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	2d3b7251a4	*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	25ae3d3161	generalized definition of hexhash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f0d747c723	removed deprecated method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	5ff77612ac	bugfix for old WORDS storage method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	0f10bdde22	more generic cache methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	6557112d8f	small fix for plasmaURLPool.getURL() needed for new alternative htcache layout git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	440c6ee657	Implement alternative htcache layout mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	fd61209797	lines inside tags without punctuation are extended by a single dot. This enables the condenser to distinguish the lines in a better way. The result is a better preparation of snippets. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1969522dc1	removed lowercase of snippets (and other things): - added new sentence parser to condenser - sentence parsing can now handle charsets to do: charsets must be handed over to new sentence parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	43614f1b36	bugfix in collection index. the index for collections was not created correctly The bugfix includes a migration function which starts automatically after startup of yacy. This applies only to you, if you are using the new collection index. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	db294687ea	enhanced logging - more logging output - fix in log line preparation - added filter to log page - some small bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a9a0f51303	*) suppressing InterruptedException errormessage See: http://www.yacy-forum.de/viewtopic.php?t=2915 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1d4fb680ce	*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB TODO: make this limit configurable git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1586d57187	*) odtParser: better handling of large files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f17ce28b6d	) plasmaHTCache: - method loadResourceContent defined as deprecated. Please do not use this function to avoid OutOfMemory Exceptions when loading large files - new function getResourceContentStream to get an inputstream of a cache file - new function getResourceContentLength to get the size of a cached file ) httpc.java: - Bugfix: resource content was loaded into memory even if this was not requested ) Crawler: - new option to hold loaded resource content in memory - adding option to use the worker class without the worker pool (needed by the snippet fetcher) ) plasmaSnippetCache - snippet loader does not use a crawl-worker from pool but uses a newly created instance to avoid blocking by normal crawling activity. - now operates on streams instead of byte arrays to avoid OutOfMemory Exceptions when operating on large files - snippet loader now forces the crawl-worker to keep the loaded resource in memory to avoid IO ) plasmaCondenser: adding new function getWords that can directly operate on input streams ) Parsers - keep resource in memory whenever possible (to avoid IO) - when parsing from stream the content length must be passed to the parser function now. this length value is needed by the parsers to decide if the parsed resource content is to large to hold it in memory and must be stored to file - AbstractParser.java: new function to pass the contentLength of a resource to the parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	630a955674	read snippets from cache in case they are not provided in RAM git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	dbc2e039bb	added time-out option parameter to call hierarchy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	00746ca232	identified and fixed search performance problem caused by snippet loading. Some access to header-db had been twice and even more times in some cases. Snippet resource loading fixed. Furthermore the snippet loading during remote search within the remote peer has been disabled, but can be switched on remotely by new flag 'includesnippet=true' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	310f1c41cd	added option to see ranking scores in surftipps and some cleanups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a2e3095044	*) Bugfix. Add missing plasmaParserDocument.close() calls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	cd5f349666	) Better handling of large files during parsing Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory ) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array ) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array Attention: the caller of this function has to ensure that enough memory is available to do this to avoid OutOfMemory Exceptions ) httpd.java: better error handling if the soaphander is not installed ) pdfParser.java: - better handling of documents with exotic charsets - better handling of large documents - better error logging of encrypted documents ) rtfParser.java: Bugfix for UTF-8 support ) tarParser.java: better handling of large documents ) zipParser.java: better handling of large documents ) plasmaCrawlEURL.java: new errorcode for encrypted documents ) plasmaParserDocument.java: the extracted text can now be passed to this object as byte array or temp file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
low012	f8ac694e51	*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 ) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b73efd5565	*) missing changes needed because of last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2463e5624a	'quick' release 0.47 - documentation update - necessary bugfixes (missing css for new peers) - reduced effect of search result redundancy filter - removed some debug output, but not all git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	625c2ce6b1	*) bugfix for snippet fetching problem if content but not http header is available in cache See: http://www.yacy-forum.de/viewtopic.php?p=25748 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	813a8a8179	*) migration of mimeTypeParser to jmimemagic 0.1 - better mimetype detection for rss feeds - better mimetype detection for odt documents (less memory consuming) - two new detector classes implementing MagicDetector interface of jmimemagic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	3f5a4153a0	Make Peers more receptible to transferred indexes - Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit so that the inCache gets flushed when the limit is passed - Modify flushCacheSome to flush enough words to get below MaxWordCount immediately git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b6c7b91582	) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher) ) better logging of parser failures ) simplified usage of plasmaparser through switchboard ) restructuring of crawler - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher) ) snippet-fetcher: more verbose error messages ) serverByteBuffer.java: adding new function append(String,encoding) *) serverFileUtils.java: adding functions to copy only a given number of bytes between streams git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1dc12d6659	*) Bugfix for shutdown problem caused by cacheScan thread See: http://www.yacy-forum.de/viewtopic.php?p=25729 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
borg-0300	42173462f5	rename cutUrlText to shortenURLString; other little things; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	26dfbb7499	*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	cf6acff2c2	*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the default InputStream Buffer size. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5c6251bced	*) some improvements for extended html document charset support - new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract the charset meta data. This is only enabled for the crawler at the moment. Integration into proxy needs more testing. - adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed about detected tags (used by the htmlFilterInputStream.java) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f453c14b5d	removed unreacheable catch blocks and unused imports git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ad7f600f25	*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	97d2a08ef1	*) restructuring needed to support parsing of documents using various charsets - serverFileUtils.java: -- adding methods to copy from stream to writer and readers to writers -- moving httpc writeX methods into serverFileUtils class - serverCharBuffer.java: removing inheritance from Writer class - replacing htmlFilterOutputStream by htmlFilterWriter class which handles content as char stream - htmlFilterContentTransformer.java: deactivating getText mode (still needs to be migrated to use char streams instead of byte streams) - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream - changes in Scraper and Transformer classes to operate on chars instead of bytes - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	3aac5b26da	- added automatic tag generation when a web page from the search results is added - added new image 'B' in front of search results for bookmark generation - added news generation when a public bookmark is added - the '+' in front of search results has new meaning: positive rating for that result - added news generation when a '+' is hit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f644a1c3a7	better evaluation of index abstracts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	2fd610b556	http://www.yacy-forum.de/viewtopic.php?p=25611#25611 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	06fa891152	*) htmlFilterContentScraper.java: using proper charset for document title git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	74c3e7cf29	) storing document charset into plasmaParserDocument object (is needed later by the condenser) ) htmlFilterContentScraper.java: using proper charset for document title *) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	c5d3020941	*) better errorhandling for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	d0a5a53789	*) changes needed for multi-language support - parsers may need to know the charset of the byte stream git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	26ab1fa885	fixed null pointer exception See http://www.yacy-forum.de/viewtopic.php?p=25598#25598 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b0e8ff6eda	*) some TODO makers for UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	41e27b85b7	fix for crawler condition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	9ecf7f0da2	*) some TODO makers for UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2578 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c89d8142bb	replaced old 'kCache' by a full-controlled cache there are now two full-controlled caches for incoming indexes: - dhtIn - dhtOut during indexing, all indexes that shall not be transported to remote peers because they belong to the own peer are stored to dhtIn. It is furthermore ensured that received indexes are not again transmitted to other peers directly. They may, however be transmitted later if the network grows. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6e2907135a	bugfixes for remote search server part git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	cf9884e22b	first attempt to implement a secondary search this is a set of search processes that shall enrich search results with specialized requests to realize a combination of search results from different peers. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b251076e64	avoid ConcurrentModificationException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	75b198bc02	- updated references to indexContainer - more bugfixes and debugging for indexAbstract processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b7e7808ea6	wordmigration now works also for new index database if the new database is switched on, no 'too big' messages appear, all the WORDS files can be completely migrated git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2553 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a0ddf2ec11	) AbstractCrawlWorker.java: delete already downloaded data on crawling error ) plasmaSwitchboard.java: log unexpected errors while parsing/indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4f9e42d5ed	more changes towards better join-search - fixed problems with index-abstract generation - added analysis output for index abstract receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a7281a9b4d	fix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	82a6054275	- fixed bug with new indexAbstract generation - added partly evaluation of indexAbstracts during remote searches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	fded1f4a5d	*) better handling of maximum file size limit in crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	74d1dea30b	changes towards better join-search - added generation of a compressed index within remote peers during global search - added selection of specific urls within remote peers during secondary global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ae4e8ce03e	- cut for 'probably last html-interface version': version number update - small enhancement to ranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2536 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	64bed59ee8	enhancements to ranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	63893003be	) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use. ) adding first version of maximum filesize check for the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	94d7ced900	fix for last ranking commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2529 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	03835c2ee8	enhanced search result computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ac3419b65f	better debugging for indexOutOfBoundException bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a8bc768206	enhancements to ranking evaluation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	33898ae7e9	*) ResourceInfoFactory.java: Bugfix for classNotFoundException See: http://www.yacy-forum.de/viewtopic.php?t=2797 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	406e170e25	*) more verbose error message git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b298474e22	*) Bugfix needed because of changed plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	96c6e4e322	- enhancements to detailed search page - enhancements to search ranking computation process - removed bugs in postranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	a5ed86105b	*) bugfix for handling of ResourceInfo object in proxy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	ff4362b02d	some more fixes for new plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	7aeadbe7cc	another NullPointerException in http.ResourceInfo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	141f9e5bb4	fix for new plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	087f7511f8	prevent NullPointerException in http.ResourceInfo git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a2525072f2	bugfix for kelondroRow - property generation this bug affected ranking parameters :-( git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b44514242a	*) crawler/ftp/CrawlWorker.java: better errorhandling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	7d7f30139c	*) crawler/ftp/CrawlWorker.java: delete old cache file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	4ae0f122f8	*) ResourceInfo.java: License header added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	043edfa4d8	) ftp/ResourceInfo.java ResourceInfo object for ftp resources added ) ftp/CrawlWorker.java better errorhandling for ftp crawler *) plasmaCrawlEURL.java: some errorcodes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8a0e35618b	enhancements to search result preparation - added detailed count on remote search results - enhanced search sequence during remote searches (doing local search in sequence) - strict adherence to timout limits git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5c1bb53d2a	Missing description for last commit *) next step of restructuring for new crawlers > HTCaching should now work protocol independent -- introduction of new ResourceInfo objects containing protocolspecific metadata of a resource. -- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX, shallStoreCacheForXXX in a protocol dependent manner > Indexing should also work protocol independent now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	dae763d8e3	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	4825bfaaf3	*) Bugfix for PrintWriter Problem See: http://www.yacy-forum.de/viewtopic.php?t=2792 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	7930839594	) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path ) CrawlWorker.java: using new dirhtml function of ftpc git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	7a35b8e237	*) direct access to responseheaders of sbQueue.Entry removed to make it more http independent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ffbf416e76	*) direct access to requestheader of htCache.Entry removed to make it more http independent git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	3870d615e3	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	393a7d10be	*) setting htCache.Entry fields to private git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ab5a9bee66	) adding some copyright headers ) next step of restructuring for new crawlers - adding first testversion of ftp crawler class -- does not create a htCache entry yet git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5847492537	*) next step of restructuring for new crawlers - IndexCreate_p.java: correcting problems with ftp urls - URL.java does not cutout the userinfo anymore (needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de) - plasmaCrawlLoader.java: -- hack to re enable https urls -- adding function getSupportedProtocols git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	fce9e7741b	*) next step of restructuring for new crawlers - renaming of http specific crawler settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	e3f0136606	*) next step of restructuring for new crawlers - adding function isSupportedProcotol to plasmaCrawlLoader.java - disabling robots.txt check for protocols other than http(s) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	9ded4e8d5a	*) Bugfix for name resolution in proxy mode See: http://www.yacy-forum.de/viewtopic.php?p=25241 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1c8300fcec	*) Bugfix for name resolution in proxy mode See: http://www.yacy-forum.de/viewtopic.php?p=25241 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	4e2a950ac9	*) next step of restructuring for new crawlers - avoid using the http crawler class directly. Using the interface class instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2476 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	09b106eb04	*) next step of restructuring for new crawlers - adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher) to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...]) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	eb9b138986	*) next step of restructuring for new crawlers - conversion of the crawler pool into a keyed object pool - crawlers are now loaded based on the url protocol (of course works only for http now) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1395aae742	*) starting restructuring which is needed to add crawlers for additional protocols git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	b4acbdaa97	*) better handling of server shutdown See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2470 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f3ac4dbbb9	*) better handling of server shutdown See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	959b779aba	*) avoid performance loss if log level is greater than 'fine' See: http://www.yacy-forum.de/viewtopic.php?p=25180 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2467 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	18b6876860	new cache flush configuration settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	f0278b4092	Bugfix for / by zero when the AssortmentCluster is empty See: http://www.yacy-forum.de/viewtopic.php?t=2746 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2459 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	14e0bb0dcf	allow more references per word for new db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2458 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	985dcbde7f	changed some parameters that may cause better memory usage and more indexing speed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b7f4a1521b	added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c26da4893b	turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	db1eae0227	* simplified initialization of database objects * replaced kelondroTree for NURLs by kelondroFlex * replaced kelondroTree for EURLs by kelondroFlex take care, may be very buggy please finish crawls before updating. crawls will be lost. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	0b73f2b132	Repair DNS prefetch during cacheScan git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	27a159b401	* documentation update * removed doc from release * release information in doc/News.html * release 0.46 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2442 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f80f776b89	*) Trying to solve NullpointerException problem in function addURLtoErrorDB See: http://www.yacy-forum.de/viewtopic.php?t=2705 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2441 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hydrox	1c99b5a484	)fixed logging for urldbcleanup )changed exception handling in urldbcleanup so that it shows NullPointerException correctly *)added more Blacklisting to urlcleaner git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2436 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f3f4ab0eb	enhanced synchronisation in plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2433 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	23dd972608	fixed memory calculation in performanceMemory web page fixed also maximum cache size computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1ce3c22761	better memory control: - added memory monitor for preNURL-db in performanceMemory - changed default memory assignments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	39b4c26bdc	more memory control: - catchup of OutOfMemoryError in server threads - automatic adoption of word cache size after a Short Mem Cycle git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2426 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3e9d509c39	some small fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2425 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	eb633c0a4f	server threads must now supply a method that can be called in case of short memory. This has been realized for the indexing thread. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2421 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f5720cb2fa	removed most synchronization in wordIndex (for testing) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2420 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0187c60010	because of a bug in the JRE 1.4.2 there was no memory protection see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4686462 this commit fixes the bug by using a memory-computation patch. All uses of Runtime.maxMemory had been replaced by serverMemory.max The bug is not present any more in Java 1.5 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2419 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cfb51fdef1	less synchronization in plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2416 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d6a928c2da	quickfix for http://www.yacy-forum.de/viewtopic.php?t=2705 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2415 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6ad471ef96	* applied many compiler warning recommendations * cleaned up code * added unit test code * migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	9da3aa74d3	silly me, fix for the fix as advised by theli git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2408 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	bb3d9a5582	*) e.getMessage().indexOf() can only be used if there is actually an ExceptionMessage. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2407 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	7a54010a9c	*) Iterators can't be casted to IndexContainer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2406 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cd5f7e137c	fixed problem with NURL-generation upon first startup (a new kelondroFlexTable was generated, which should not) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8418af141a	added several consistency checks and small changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2400 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9d13aeca13	*) removing class. does not work so far git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2399 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	95a84ae469	*) adding missing classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2398 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	eee44be602	*) adding an interface for customized blacklist classes - now it's possible to use a customized blacklist engine instead of the default one - this can be done by configuring the property BlackLists.class See: http://www.yacy-forum.de/viewtopic.php?t=2108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6d2f15971a	there is a very strange error that causes that the kelondroRecords structure is corrupted. The cause is, that the deleted-records-chain has wrong entries, and one of the pointers in that chain points to a place behind the file end. This causes an IndexOutOfBoundsException within an IO operation. I currently don't know the reason that the deleted-records-chain is corrupted, but the error can be catched. If this now happens with the assortment database, the database is deleted. See also: http://www.yacy-forum.de/viewtopic.php?p=24586#24586 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2396 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d2e8e76218	*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler See: http://www.yacy-forum.de/viewtopic.php?t=2541 http://www.yacy-forum.de/viewtopic.php?p=24516 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9ae9062bd3	* disabled new kelondroFlex table for NURLs * added new RAM index Class * fixed possible synchronization problem in kelondroRecords git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2388 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	689bbcf9cd	replaced kelondroTree db for NURLs by new kelondroFlexTable The new database is only created if the old is deleted or does not exist git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fbba41962	synchronization fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2386 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	328f9859a5	more synchronization in plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2385 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	130e6d4719	generalized index object for eurl, nurl and lurl to prepare move of these tables to new kelondroFlexTable Object git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	acdf24877f	more synchronization against outOfMemoryError in wordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2381 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	95160d7f2c	fixed size computation of index elements from the collection index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	26116cabde	added missing rowdef assignment git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2379 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	abf22f6e60	removed url normalform computation from htmlFilterContentScraper. This method was implemented in de.anomic.net.URL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	740d49751d	* strict type and size check in kelondroRow handling * adopted all code to use the declaration form of kelondroRow * fixed a bug in kelondroRow which caused wrong parsing of encoding type * the bug caused bad database behaviour in new indexCollection data structure. because of this bug, all test databases are now already void. A new database is created * the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition into a properties file along the database files. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	314021453f	* more logging * option in yacy.init to set useCollectionIndex usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	61b151b083	* added another auto-fix for collection index inconsitency check * fixed words size computation for collection index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2368 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f58283def2	better control of index flush git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4be21a3cab	ups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2363 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	80b6c90d54	enhancements to prevent blocking during dht transfer receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9f298083cd	*) adding more urls to the error url - old error strings where replaced with there corresponding constants See: http://www.yacy-forum.de/viewtopic.php?t=2638 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	d56f06401e	- Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible - Small logging updates git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c09f734d06	*) offer router configuration on ConfigBasic.html - checkbox to allow router configuration is shown if - a) the UPnP forwarder is installed - b) a UPnP enabled router was found - c) no other forwarder was configured See: http://www.yacy-forum.de/viewtopic.php?p=24264 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	dcbb4d0a6b	Display the size of HashBlacklistedCache on PerformanceMemory page. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2357 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d799622da1	better flush limit for index collections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	279b1d969d	Integrated new indexing data structure 'collections' into the main class for indexing, the plasmaWordIndex. The new data structure is ready-to-use, but currently disabled. It can be activated by setting the static plasmaWordIndex.useCollectionIndex to true. This shall be done for testing purpose. The new index is stored to DATA/INDEX/PUBLIC/TEXT The directory PLASMA shall be used only for crawler in the future. Attention: during testing the data structure in INDEX may change, and created indexes with the new data structure may get useless. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4ff742e42d	implemented indexCollectionRI this is the new database structure that is supposed to replace the plasmaAssortmentCluster AND the plasmaWordIndexFileCluster The new structure is not yet active and needs to be integrated into plasmaWordIndex. This has some migration constraints that are not yet completely solved. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	01f95eccd3	re-write of kelondroCollectionIndex. This is the data structure that shall replace the current assortment files. * used the kelondroFlexTable to hold the index of collections * used kelondroRow definitions to declare all data structures * fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ebc2233092	* implemented (finished) class indexRowSetContainer * replaced indexTreeMapContainer by indexRowSetContainer * deleted indexTreeMapContainer and abstract class This is another step to the new database structure git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9183d21f25	renamed new index class to old name git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c4e922885a	replaced indexURLEntry by new class that uses a kelondroRow.Entry object to store the index entry. This is another step to move to the new database structure. A side effect of this change is, that index storage uses much less RAM space, which affects the index RAM cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e357599f92	* fixed problem with indexContainer iteration from RAM: indexContainers from RAM must be cloned explicitely to prevent side-effects on stored indexContainer objects in Cache * changed behaviour of urlReference deletion from indexContainers: deletion does not user retrieval of all Elements from the assortments * added textual configuration of kelondroRow and kelondroColumn definition * update of kelondroRow usage in yacyNews * modified kelondroAttrSeq to use modified kelondroColumn parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8b77afd72c	some fixes to new container merger and some code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	417ed5102e	redesign of database iterators: an iteration of key elements in kelondroTree databases is no longer supported. this is now replaced by an iteration of kelondroRow.Entry objects from the database Iteration of keys from the database was mostly followed by retrieval of the row from the database, whcih caused unnecessary database load. The index selection was also redesigned to use the new row iteration methods. This affects many funktions, most important is the DHT selection routine which is now much faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ad692fc6c7	implemented option to extract nurls from the database (plus some iteration enhancements for nurls) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fd90ca7c8	* strict handling of NURL entry element generation, storage and stacking * more space for EURL reason strings (you must delete the EURL db to use this) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5f72be2a95	some redesign of EURL storage * store() is now called explicitely * more urls are written to the EURL table * the EURL stack does not store the complete entry any more, now only the URL hash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1ed3e2daef	added option to extract domains and/or urls from the eurl database when extracting from eurl, the html output format is recommended, since this format adds also the fail reason to the domain/url. The complete syntax for domain extraction is now java -Xmx<megabytes>m -classpath classes yacy -domlist [ -source { lurl \| eurl } ] [ -format { text \| zip \| gzip \| html } ] [ <path to DATA folder> ] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2322 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	58df8b7bbf	a large collection of different changes * mainly for the transition to the new indexing database structure * a bugfix for an endless loop inside kelondroTree iteration * a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice * very strong speed enhancement for url/domain extraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e4f1820b58	protection against too long authentication strings in switchboard see also: http://www.yacy-forum.de/viewtopic.php?p=23943#23943 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2312 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b3c569f706	*) renaming of function getTransferedEntitySpeed to getTransferedEntrySpeed to avoid confusion git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2308 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5214f571cd	simplified method call in balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2303 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7935f27038	enhanced synchronization in balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2291 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	07900366ac	deactivated cache-initialization for file-indexes (files in WORDS) see also: http://www.yacy-forum.de/viewtopic.php?p=23801#23801 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2289 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	40aa735520	fixe timing problem causing too long delay during initialization of kelondroTree objects git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2288 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	24a02cbeef	*) Bugfix for not parsable application/xhtml+xml resources if an URL has no extension See: http://www.yacy-forum.de/viewtopic.php?p=23687 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2280 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

... 3 4 5 6 7 ...

1169 Commits (54ddb3262ca5f83ced4510c80c33563c16a94bcc)