yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	f27f9ecf15	* activated write buffer for databases. This should increase IO performance and reduce HD activity * bugfixes for new exception-on-failure policy * bugfixes for new IOChunks * new Object pool for database write-buffer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1204 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c59d1b2f5e	- Tests with write buffer (new class kelondroBufferedIOChunks, not yet active) - minor bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1203 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bb79fb5d91	- changed handling of error cases retrieving urls from database (no more NULL values are returned, instead, an IOException is thrown) - removed ugly damagedURLS implementation from plasmaCrawlLURL.java (this inserted a static value into the Object which is not really a good style) - re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message to do: - the urldbcleanup feature must be re-tested git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	e7d16ef831	*) Corrections in jMimeMagic MagicRule-file to detect some special rss feeds git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1196 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	386d9e45d8	*) Bugfix for code cleanup - Code must be in finally block, otherwise it does not work if an error occurs! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1193 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5a1d45715d	*) Bugfix for parser configuration bug - it was not possible to disable all parsers See: http://www.yacy-forum.de/viewtopic.php?t=1579 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1191 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	a1061495d4	Fixed some spelling mistakes and added some text which (should) make it easier to understand the options. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1187 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0cdc58aaea	fixed indexing of local domains. see http://www.yacy-forum.de/viewtopic.php?p=13680#13680 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1186 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	e1c2d8ec5f	*) Speedup "removed from queue" See: http://www.yacy-forum.de/viewtopic.php?p=13442#12188 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1183 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	96930f0d2b	*)added function to removed malformed URLs from urlHash.db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1182 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8862b6ba4b	*) Corrections for code cleanup 1175 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1179 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	13fdebc50d	added authentication for link deletion in search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1177 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	37f88b4017	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ec2b39c1ce	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1175 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f1f2daa5e	implemented interactive link deletion of search results. next steps: attach voting and restrict to administrator to see the deletion button, move the mouse pointer to the left of a search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6d0f7e6988	*) Adding missing file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1171 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	44fa94ac52	) Modifications for dbImport functionality - dbImporter threads are now shutdown by the switchboard on server shutdown - adding possibility to pause a importer thread via GUI - Bugfix for abort function See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363 ) Modification of content parser configuration - now it's possible to configure which parsers should be enabled for the proxy, crawler, icap, etc. separately - ) htmlFilterContentScraper.java - adding regular expression to normalize URLs containing /../ and /./ parts ) httpc.java - adding functionality to unzip gzipped content - requested by roland: should be used later to allow gzipped seed lists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	dc778659fb	fixed problem with time-out during result joint which caused OR behavior instead of AND beahvior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1167 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3d8a5ae652	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	64478b1f02	*) Adding possibility to delete crawler queue entries using regular expressions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1160 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
low012	90b0eb144e	just a typo... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1155 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	129b15f3e1	*) Correcting logging output of db importer thread See: http://www.yacy-forum.de/viewtopic.php?t=1555 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1154 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	420d56ce79	extended db-testing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1152 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ecf765ec33	temporary fix to make jrpm extension compilable with my netbeans environment git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1151 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8ed0aaae8d	*) Adding content Parser for RPM Files - at the moment only the metadata is extracted git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1147 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	818d37ce44	*) Removing getSimpleName git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1143 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b35c5a48bf	*) First version of urlRedirector.pl script - with this script it's possible to pass URLs from squid to yacy via the squid redirector interface - this URLs are then used by YaCy to feed the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1141 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	bdf30117c1	) Redesign of parser configuration - restructuring of mimeTypes based on the parsers - displaying parser usage count - displaying human readably parser names - displaying parser version information ) httpdFileHandler.java - adding possibility to support "streaming" servlets which are special servlets that can communicate with the client via the connection streams autonomous - the name of these new servlet types must end with the file extension .stream - this feature will be needed by the yacy ScreenSaver class to fetch statistic data from the peer without the need to reconnect to the server all the time ) Adding human readable names and version information for all supported parsers ) plasmaParser.java - adding new structure to store parser statistic data ) Adding openDocument parser - can be used to parse odt files ) jmimemagic - adding rules to detect openDocument formats properly *) serverLog.java - adding functions that can be used to query if a given logging level is enabled or not. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d4ac3e25b1	*) Bugfix for file system link bug during detection of invalid URLs See: http://www.yacy-forum.de/viewtopic.php?p=13301 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1134 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	adf75bc9fa	better logging for invalid file path detection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1133 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	40621a5663	anhancements in ranking preparation and fixed problem with parser/mime recognition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1132 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c650b112ea	*) Bugfix for relative URL Bug in Crawler See: http://www.yacy-forum.de/viewtopic.php?p=13266#13266 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1130 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4e73035aef	*) Bugfix for "too many open files" during index distribution git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1128 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f57e2d67f5	shortened network overview (less columns fit easier on page) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1124 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	85282b1d98	enhanced YBR recognition and search result heuristics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b9cc9029e3	added ybr selection for remote search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0e25020f51	added first generation and usage of YBR index-files. Enhanced overall ranking of search results. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	90d6c6223b	*) Adding color codes to network graphic legend git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1114 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bfe51c7228	added generation of domain-list git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1112 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0ec54d9c5f	enhanced CR-file handling and added first RCI-evaluation tests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c2fe3a1670	) Updating jMimeMagic Ruleset - to detect some special formated html documents correctly - adding rule to detect vCards ) plasmaParser now supports parsing of files that have a supported fileExtension but a unsupported mimeType because the webserver has set it incorrectly to text/plain *) Adding vCard new Parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1107 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	88e3234393	fine-tuning of rci-generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a12759c1bf	first try to implement a rci-computation from cr-files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1103 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4a8e8f269e	refactoring of cr-processing; new kelondro class to handle the attribute file format git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1100 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	24dc0e0760	implemented cr-file processing and further transmission steps git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9d9a87f445	limited htcache storage length git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1096 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d0dfccdb77	*) Making CrawlStacker pool configurable via GUI and config file See: http://www.yacy-forum.de/viewtopic.php?t=1448 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1087 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	3631cb1f6d	*) deleting empty entities during index selection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1086 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ca26aab9b1	*) More debugging output for migrateWords git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1085 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9b35ae9027	*) Correcting wrong % values on IndexTransfer_p page See: http://www.yacy-forum.de/viewtopic.php?p=12646 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1084 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	e6bf9d90a5	*) Fixing Problems with MalformedURLs during Word Selection - removing (lurl.toString() == null) comparison because toString() is never null - adding (lurl.url() == null) condition because url() is null if we have selected a word entry with a malformed URL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1083 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	86a9210264	*) indexing queue slots are now configurable via config file See: http://www.yacy-forum.de/viewtopic.php?t=1480 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1081 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	3c11d7b81c	*) Bugfix for minimizeUrlDB - function didn't work correctly because of new url hash structure See: http://www.yacy-forum.de/viewtopic.php?p=12753#12753 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1080 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9913049009	fixed outOfMemory bug caused by loops in kelondroTree during enumeration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1079 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	bbb936b9ea	*) Bugfix for not human readable content of PDFs while viewing the URL Content via GUI - This Bug also affects the snippet generation on non html/text documents See: http://www.yacy-forum.de/viewtopic.php?t=1472 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1075 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	445e3a620f	*) Avoid rejecting of html content by the crawler when the file extension is not set properly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1074 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	444a5a9368	*) Bugfix for Entries with null url in GlobalQueue See: http://www.yacy-forum.de/viewtopic.php?p=12675#12675 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1069 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	ebac51df52	restore defaultRemoteProfile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1063 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	5778428455	move cutUrlText to nxTools, max length from URLs(title) on searchpage now 120 chars git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1060 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	9158845c3b	bugfix for snippet text null bytes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1059 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f763923e0a	added missing files for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	79818a320f	introduced citation-rank transmission protocol and activate transport for anonymisation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7e0647f692	*) Bugfix for userDB usage during authentication git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1052 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	02f8013013	auto-delete of corrupted word files during word-migration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1047 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d2731418bf	added creation of global ranking files and changed url normal form usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6f9f8ed8f8	*) Automatic Reset of Stack Crawler DB on startup errors See: http://www.yacy-forum.de/viewtopic.php?t=1432 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1045 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fb766413d1	*) Changes on httpc dns caching - Bugfix: old dns cache did not handle case insensitive hostnames correctly. - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache e.g. borg-300.dyndns.org This can be done by setting the new httpc.nameCacheNoCachingPatterns property - using httpc.dnsResolve wherever possible within the sourcecode [httpd.java,plasmaCrawlStacker.java] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bc420c62f6	fixed htcache path generation (never change a running system) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1041 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	dd24f0252f	*) Searchword highlighting for info page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1036 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	72cde1d894	getCachePath: no logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1033 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	1fbd72f9e0	rename "index.html" to "ndx" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1032 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	cd1107d85e	added support for URLs with '?&' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1030 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	5fb2b017cb	small change git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1029 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	544e4ea90e	small change git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1027 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	00ab4d8723	cleaned, small change, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1026 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b8ceb1ffde	) Adding better https support for crawler - solving problems with unkown certificates by implementing a dummy trust Manager - adding https support to robots-parser - Seed File can now be downloaded from https resources - adapting plasmaHTCache.java to support https URLs properly ) URL Normalization - sub URLs are now normalized properly during indexing - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function - normalizing URLs which were received by a crawlOrder request git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	e3179a6394	added getOwnSeedFile() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1022 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a803a509ae	bugfix: port handling in HTCache grogram flow, cleared up git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1021 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	cb69047b91	*)cleanup access static methods and fields git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	56b9f34411	*)removed unused imports git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5f68b6886b	introduced new url-hashes for better ranking computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1013 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	aadace1285	fixed network image in search performance monitor git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1012 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bb369c98de	fixed search result ordering by date git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1011 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b058ecf0bc	refactoring of image-generation; added experimental PNG encoder (not active now) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1008 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d42531e1b2	added auto-reset for NURL-DBs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1004 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	92c49b406b	adminAuth with userDB and adminAuthenticated (fix for statuspage) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1001 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	27f180f24b	Update of YaWoStat to 0.2. Now does not try to make 400000! operations to load a 4MB textfile :-/ Program is not finished yet. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1000 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d656e2b433	added a memory-profile chart generation to database performance testing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@993 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ec3af327f7	) Bugfix for Proxy-Authentication against remote proxy See: http://www.yacy-forum.de/viewtopic.php?p=11804#11804 ) Adding first version of db test for mysql NOTES: - db user + db + db table must be created before starting the test - db table must be empty. Entries can not be updated at the moment - db connection properties must be changed in the sourcecode at the moment TODOs: - accepting connection properties via command line - implementing update + remove + read operations - 'maybe' adding code to create db + table if it doesn't exists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@991 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5b0911d7ea	added new performance menu for search sequence configuration and monitoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@990 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	ada06b0674	bugfix for Networkimage from Hydrox git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@986 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1aa4ba8b62	added post-search filtering of redundant urls (longer than existing cited) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@982 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8d827cdb30	tried to fix problems with order of network list by last-seen (which could also improve the network picture) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@980 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	097009d910	experimental visualization of DHT access during global search (temporary) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@977 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4dcbc26ef1	introduction of search profiles; very experimental git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6c48c3ce39	*) Bugfix for ArithmeticException during IndexTransfer See: http://www.yacy-forum.de/viewtopic.php?t=1362 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@974 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	525c8dcbd4	*) Adding Traffic Statistic for Crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@972 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9a5ab62928	) Adding yacy specific X-YACY-Index-Control header which can be used by clients to disallow yacy to index the response that belongs to the request where X-YACY-Index-Contro is set to "no-index" ) Bugfix for Seed-List download via Remote Proxy. Now the pragma and cache-control http headers of the request are properly set to "no-cache" See: http://www.yacy-forum.de/viewtopic.php?p=11639#11639 *) Bugfix for http-Proxy yacy has ignored "no-cache"- pragma and cache-control http headers that were send in requests. Now, these request headers are evaluated properly TODO: Missing evaluation of "no-store" request headers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@971 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	02d9af1a70	) Restructuring and extending of Remote Proxy Support - remote proxy configuration can now be "really" changed on the fly and takes effect immediately - adding possibility to disable remote proxy usage for yacy->yacy communication - adding possibility to disable remote proxy usage for ssl - restructuring proxy configuration so that it is stored in a single place now ) Adding possibility to import a foreign word DB (or even more of them in parallel) at runtime into the peers DB - this can be done by calling IndexImport_p.html - ATTENTION: please not that at the moment this thread must be aborted via gui before a normal server shutdown is done. - TODO: integrating IndexImport Thread into normal server shutdown - TODO: Adding posibility to import crawl-queues, etc. from foreign peers - TODO: removing old import function from yacy.java and calling the new routines instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	58b670201d	now, changed HTCacheSize needs no restart git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@961 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	40777556c5	) Connection Tracking - adding automatic refresh - accepts new parameter nameLookup which can be used to deactivate yacy-peer name lookup (because we have problems with this on large seed-dbs) ) ViewFile New page that can be used to view - original content - plain text content - parsed content - parsed sentences of a webpage specified by there url hash Mainly for debugging purpose at the moment ) Robots.txt Bugfix for if-modified-since usage TODO: synchronization of downloads to avoid loading the same robots-file multiple times in parallel by different threads ) Shutdown Better abortion of transferRWI and transferURL sessions on server shutdown *) Status Page Adding icon to start/stop crawling via status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	a98bafb939	Changes to german language file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@941 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	95abdeb685	*) Bugfix for nextElement function of URL Enumerator git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@936 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6260942590	changed search process: received indexes are now buffered and written to wordIndex after search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@934 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	7ee03acce0	new function cutUrlText added to shortens the URLs on IndexMonitor.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@931 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bc56a88cc8	further refactoring of search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@925 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d29dfb0a12	refactoring of search / preparation for better search methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	0ae166c522	*) Small changes to Index Transfer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@919 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	461374e175	*) Restricting amount of files that yacy is allowed to open during index transfer/distribution This option is configurable via config file and is set per default to 800 See: http://www.yacy-forum.de/viewtopic.php?p=11137#11137 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@918 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c8a35a0130	) Adding new connection tracking page (currently only for incoming connections) ) Displaying statistic for incoming connections on status page ) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy See: http://www.yacy-forum.de/viewtopic.php?p=6826 ) Bugfix for Referer Bug See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098 *) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b80b2fbdcc	crawling peers now produce waves in network graphic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@912 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	10d3627c90	changed word cache flush scheduling and removed possible locks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@910 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	839db8869c	added high/low priority for index adding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1688be8590	) plasmaSwitchboard.java adding more verbose logging output for db initialization ) httpdFileHandler.java adding cache for servlet response methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@897 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e9eb5e4b56	refactoring of index-entity join methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@895 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	258fd9eb8e	adding missing file for websearch refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@894 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	77ae30063d	refactoring of websearch process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@893 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	579b22d8ff	small update to network drawing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@892 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2b5829c3da	small fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@891 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4c7918f5b5	added shotdown to crawl stacker (moved from 882) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@889 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2851658c2a	re-integrated Martins last change to crawl stacker from svn 882 that I had deleted accidently git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@888 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c83594528c	integrated crawl stacker into thread control git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@887 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	959eefbc4f	) Robots.txt parser/ppt cutting of comments at the line end ) Adding Threadpool for stackCrawl Thread to speedup robots.txt download and double url checks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@882 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	f65c939a60	userDB Auth git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@874 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1a5d98cd6d	better imagePainter example and fix for typo http://www.yacy-forum.de/viewtopic.php?p=10920#10920 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@868 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f6cf3967de	fix for compile-bug in svn 583 (Martin guck mal ob das richtig ist: fifo oder filo-stack?) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@854 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6d5d0ac801	bugfix for startup problems git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@850 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0c3a20d44f	more + changed log for better understanding of outOfMemory bug and others git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	0fd9aa6c6e	*) Bugfix: supportedFileExt Function didn't detect the file extension correctly because of missing conversion to lower case git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@837 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8a33c9b309	*) Bugfix: supportedFileExt Function didn't detect the file extension correctly if there was a dot in one of the parent directories of the file. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@836 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	28c5687ff9	*) Bugfix for "download of non supported file content" via crawler See: http://www.yacy-forum.de/viewtopic.php?p=10724#10724 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@835 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	2b3f964037	*) Bugfix: supportedFileExt Function didn't chop http parameters before trying to detect the file extension git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@834 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	ff1d3d0680	Init of userDB Pagelayout of User_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@822 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9c4306e41e	fixed problem with htcache path git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@811 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1669eaaa1a	fixed svn 805 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@807 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	ca82d690a9	changed in SVN 805 one line too much git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@806 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	4bb1f849a0	Bugfix for http://www.yacy-forum.de/viewtopic.php?t=1233 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@805 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2c7b490e30	memory-logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@804 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fc822a59b	changed handling of time-zones git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9b7f37fc37	*) Minor changes - more debugging output: storageTime for indexed document is logged now - saving memory in plasmaParserDocument.java, plasmaWordIndexEntryContainer.java (not a big deal) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@798 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b5a8992d29	*) Setting some object fields to final git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@796 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	023be89586	*) Bugfix for "Robots.txt wird immer wieder geladen" See: http://www.yacy-forum.de/viewtopic.php?p=10241#10233 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@794 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	35c6c5ead7	*) Bugfix for "Blacklist und Crawlen" Bug. : Crawling continues even if URL is listed in Blacklist See: http://www.yacy-forum.de/viewtopic.php?p=10279#10279 - missing return statement added. Thanks to allo for the code review. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@793 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9e2fc7e5fe	load balancing of crawl target domains git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@791 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3fcc95a82c	integrated crawl-profiles db in memory-performance monitor git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@788 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fe6a6abc0b	*) Adding robots.txt db to Performance Settings for Memory menue git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@785 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3274ae725e	increased cache size of robots database; however, this should be integrated into new memory control git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@784 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c6d2f50375	changed order of robots and double-check git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@783 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2 3 4 5 ...

516 Commits (5f5eee1ae9b255d0b0205897732b567c06a24efb)