yacy_search_server

Commit Graph

Author	SHA1	Message	Date
allo	ba96cefe0c	packages for xml/* bugfix for servlets with packages from theli. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1272 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b6be828d15	*) Bugfix: Share subdirectory couldn't be views because of LinkageErrors See: http://www.yacy-forum.de/viewtopic.php?t=1634 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1218 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4500506735	fixed some bugs concerning url entry retrieval and intexControl interface git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1212 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3086e38bb1	added getRemoved method on demand from theli for migration purpose git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1195 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	76b97e4d2a	integrated geo-snap DBStressTest.java in dbtest this is still beta. It uses serverInstantThreads instead of Java 1.5 code for multiple threads git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1185 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d007d14905	re-insert of migrateSwitchConfigSettings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1180 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8862b6ba4b	*) Corrections for code cleanup 1175 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1179 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	37f88b4017	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ec2b39c1ce	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1175 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f1f2daa5e	implemented interactive link deletion of search results. next steps: attach voting and restrict to administrator to see the deletion button, move the mouse pointer to the left of a search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3d8a5ae652	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1d6a6d1f85	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	8c0d982191	1.) Fix from Martin (he ist not at home...) 2.) Search button now gets blocked if clicked and JS active. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1150 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b604654c25	*) Adding possibility to do a settings migration on yacy startup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1149 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b35c5a48bf	*) First version of urlRedirector.pl script - with this script it's possible to pass URLs from squid to yacy via the squid redirector interface - this URLs are then used by YaCy to feed the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1141 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	bdf30117c1	) Redesign of parser configuration - restructuring of mimeTypes based on the parsers - displaying parser usage count - displaying human readably parser names - displaying parser version information ) httpdFileHandler.java - adding possibility to support "streaming" servlets which are special servlets that can communicate with the client via the connection streams autonomous - the name of these new servlet types must end with the file extension .stream - this feature will be needed by the yacy ScreenSaver class to fetch statistic data from the peer without the need to reconnect to the server all the time ) Adding human readable names and version information for all supported parsers ) plasmaParser.java - adding new structure to store parser statistic data ) Adding openDocument parser - can be used to parse odt files ) jmimemagic - adding rules to detect openDocument formats properly *) serverLog.java - adding functions that can be used to query if a given logging level is enabled or not. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5bf70e6e14	) Bugfix for serverClassLoader.java - Classloading didn't work properly if there are multiple classes with the same name - This could occure because the yacy servlets have no package name defined and therefore are all in the same (default) package. ) Bugfix for Duplicated Class Error See: http://www.yacy-forum.de/viewtopic.php?t=1341 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1135 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	85282b1d98	enhanced YBR recognition and search result heuristics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0e25020f51	added first generation and usage of YBR index-files. Enhanced overall ranking of search results. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0ec54d9c5f	enhanced CR-file handling and added first RCI-evaluation tests git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	88e3234393	fine-tuning of rci-generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	24dc0e0760	implemented cr-file processing and further transmission steps git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8e308cf50e	*) Possibility to change the server port on-the-fly. - Now it's possible to change the server port without the need to restart the whole server. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1089 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fd58d5f8e6	*) Adding possibility to specify the interface / IP-Address where YaCy should bind to. - e.g. Port = 192.168.0.1:8080 Port = #eth0:8080 Port = 8080 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1071 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6e81f2580d	try to fix bug with storage of settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1058 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	79818a320f	introduced citation-rank transmission protocol and activate transport for anonymisation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d2731418bf	added creation of global ranking files and changed url normal form usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fb766413d1	*) Changes on httpc dns caching - Bugfix: old dns cache did not handle case insensitive hostnames correctly. - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache e.g. borg-300.dyndns.org This can be done by setting the new httpc.nameCacheNoCachingPatterns property - using httpc.dnsResolve wherever possible within the sourcecode [httpd.java,plasmaCrawlStacker.java] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	56b9f34411	*)removed unused imports git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5f68b6886b	introduced new url-hashes for better ranking computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1013 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4d1e56e4d9	fixed intermission-bug (removed 'break for intermission' of httpd-thread) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1009 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	723e056c48	*) Bugfix for ClassCastException during SessionPool.close git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@996 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	9a2afe88d4	*) Deactivating unlimited timeout for persistent connections because this could cause problems with clients which do not shutdown persistent connections properly. - Setting timeout for idle persistent connections to 30 minutes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@983 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4dcbc26ef1	introduction of search profiles; very experimental git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	40777556c5	) Connection Tracking - adding automatic refresh - accepts new parameter nameLookup which can be used to deactivate yacy-peer name lookup (because we have problems with this on large seed-dbs) ) ViewFile New page that can be used to view - original content - plain text content - parsed content - parsed sentences of a webpage specified by there url hash Mainly for debugging purpose at the moment ) Robots.txt Bugfix for if-modified-since usage TODO: synchronization of downloads to avoid loading the same robots-file multiple times in parallel by different threads ) Shutdown Better abortion of transferRWI and transferURL sessions on server shutdown *) Status Page Adding icon to start/stop crawling via status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	e642a5d8b7	more constants git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d29dfb0a12	refactoring of search / preparation for better search methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c8a35a0130	) Adding new connection tracking page (currently only for incoming connections) ) Displaying statistic for incoming connections on status page ) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy See: http://www.yacy-forum.de/viewtopic.php?p=6826 ) Bugfix for Referer Bug See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098 *) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6a72f06c40	resizable network picture + greater on click git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@900 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3dd7e90cdd	kbytes instead of bytes in performance settings; new default values git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@808 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2c7b490e30	memory-logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@804 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fc822a59b	changed handling of time-zones git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	495bc8bec6	removed cache-control from low and medium priority caches which reduces memory use and computation overhead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@774 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	07f30931ec	various configuration options in memory performance git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@763 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2f732e32a2	enhancements to memory menue git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@762 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	96a5b6e8fb	removed yacy peer types from serverSwitch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@758 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b990dc1ad1	) Replacing jsch 0.1.19 lib with newer version 0.1.21 ) Replacing PDFBox 0.7.1 lib with newer version 0.7.2 ) Refactoring of classes httpd/httpc/httpHeaders to make many methods for httpHeader/Requestline parsing reusable for new icap implementation ) adding chunked input stream support - needed by new icap implementation - needed by future httpc HTTP/1.1 support ) httpd.java - moving all connection property contants to class httpHeader - moving readHeader function to class httpHeader - moving parseQuery function to class httpHeader - moving handleTransparentProxy function to class httpHeader ) httpHeader.java - adding new fuction to parse the http response line - adding new function to converte http headers to a string that can be send to the client - adding a function that generates a proper url using all parsed connection properties ) ICAP Support - yacy now supports handling of icap response modification requests - this feature can be used by other icap enabled proxies to contact yacy as icap server, and to handover the downloaded content to yacy.logging for indexing - functionality was successfully tested with squid 2.5Stable 10 + icap patch - further icap services e.g. URL filtering based on yacy's blacklists are possible ) plasmaSwitchboard.java - htcache entries that are still needed for indexing are now properly registered as in use after system restart - extended logging: log message now shows parsing and indexing time for each sb. entry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	42cd2cea65	added final constants, so that other class can reach it; cleaned; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@741 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2 3

139 Commits (7c0d7ed4f84237610b538e897d0663cb5bb3ce78)