yacy_search_server

Commit Graph

Author	SHA1	Message	Date
borg-0300	8d8a40c2d9	added properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1369 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cfd1e5e376	more security for index transfer protocol: - allow only specific file names - log IP number of accessing peer in case of attack attempts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1367 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	423ce9bf59	quickfix for http://www.yacy-forum.de/viewtopic.php?p=15336#15336 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1366 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	5eba6c66c6	thelis fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1364 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	c59027e520	Translated status_p.inc a bit further, but it didn't work. See http://www.yacy-forum.de/viewtopic.php?p=15180#15180 Added my seed to superseed.txt as I am now proud owner of a PC which runs YaCy most of the day. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1343 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9544c47684	added some UTF-8 handling. hope this will help somehow.. for shure not THE solution to our UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9086261476	refactoring of base64 encoding: the kelondro database needs specific information about the order of base64-encoded keys. Since no other package depends on base64 (only the httpd uses base64 for encryption, but does not need to encode these strings) it is good to move base64 encoding to the new ordering classes in kelondro. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b3dca06bb1	added location column to network pages. The location is computed from the userAgent string of connecting peers. Therefore this information is not available right after start-up. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1241 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bb79fb5d91	- changed handling of error cases retrieving urls from database (no more NULL values are returned, instead, an IOException is thrown) - removed ugly damagedURLS implementation from plasmaCrawlLURL.java (this inserted a static value into the Object which is not really a good style) - re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message to do: - the urldbcleanup feature must be re-tested git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	37f88b4017	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f1f2daa5e	implemented interactive link deletion of search results. next steps: attach voting and restrict to administrator to see the deletion button, move the mouse pointer to the left of a search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7920e1547d	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1163 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1d6a6d1f85	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b9cc9029e3	added ybr selection for remote search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	89fab9f200	*) Correcting Problems with lURLEntries containing null URLs. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1104 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	23dc904e0e	*) Correcting Problems with lURLEntries containing null URLs. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1102 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	0610ff4fe9	*) small changes to crawlReceipt.java - we do not know if the URL was stored in the noticeURL-DB with the old or new hash. therefore we now try to remove the URL from the noticeURL-DB using both hash values git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1082 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e9d6defce6	qquickfix for http://www.yacy-forum.de/viewtopic.php?p=12638#12638 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1073 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f763923e0a	added missing files for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d2731418bf	added creation of global ranking files and changed url normal form usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fb766413d1	*) Changes on httpc dns caching - Bugfix: old dns cache did not handle case insensitive hostnames correctly. - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache e.g. borg-300.dyndns.org This can be done by setting the new httpc.nameCacheNoCachingPatterns property - using httpc.dnsResolve wherever possible within the sourcecode [httpd.java,plasmaCrawlStacker.java] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	440e6ed747	see http://www.yacy-forum.de/viewtopic.php?t=1416 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1025 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b8ceb1ffde	) Adding better https support for crawler - solving problems with unkown certificates by implementing a dummy trust Manager - adding https support to robots-parser - Seed File can now be downloaded from https resources - adapting plasmaHTCache.java to support https URLs properly ) URL Normalization - sub URLs are now normalized properly during indexing - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function - normalizing URLs which were received by a crawlOrder request git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	f871408729	*) sharedBlacklist_p.java - Setting Pragma: no-cache - increasing timeout to 12 sec. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1019 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8194fde340	*) trying to continue transferRWI processing even if this error occures: \|> Caused by: de.anomic.kelondro.kelondroException: kelondroTree.searchproc: nullpointernull in db '.../urlHash.db' - if URL existence can not be determined, we request it from the remote peer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@997 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4dcbc26ef1	introduction of search profiles; very experimental git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7256bea45f	) Bugfix for nameLookup parameter handling ) Bugfix for Received xx Words [xxxxxxx .. null] Bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@953 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	40777556c5	) Connection Tracking - adding automatic refresh - accepts new parameter nameLookup which can be used to deactivate yacy-peer name lookup (because we have problems with this on large seed-dbs) ) ViewFile New page that can be used to view - original content - plain text content - parsed content - parsed sentences of a webpage specified by there url hash Mainly for debugging purpose at the moment ) Robots.txt Bugfix for if-modified-since usage TODO: synchronization of downloads to avoid loading the same robots-file multiple times in parallel by different threads ) Shutdown Better abortion of transferRWI and transferURL sessions on server shutdown *) Status Page Adding icon to start/stop crawling via status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	e642a5d8b7	more constants git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	4a8e6d552e	invokation with "emailaddress" in Parameter. (compatible with other programs than sendmail, like sendxmpp) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@929 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d29dfb0a12	refactoring of search / preparation for better search methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	839db8869c	added high/low priority for index adding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a1777788a5	small change git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@879 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	64acb46a91	cleaned, finals, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@857 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	52168fab9b	cleaned, finals, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@856 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a9c466ef21	cleaned, finals, StringBuffer, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@849 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0c3a20d44f	more + changed log for better understanding of outOfMemory bug and others git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fc822a59b	changed handling of time-zones git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
(no author)	1aa79f5bb5	cleaned; Properties; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@790 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c42a543bc3	*) Adding peername to logmessage when receiving URLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@781 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1dc94e7753	) Adding support for gzip content-encoding of http post requests used to transferRWIs and transferURLs. See: http://www.yacy-forum.de/viewtopic.php?t=1167#10020 ) adding yacyVersion.java containing constants defining yacy versions that support a given feature. Needed to determine if a remote peer is able to decode gzip content-encoded http post bodies properly. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@772 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	96a5b6e8fb	removed yacy peer types from serverSwitch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@758 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	11e175630b	StringBuffers, finals; cleaned; Properties; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@745 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fec3bb1c	*) Bugfix for " java.lang.NullPointerException at hello.respond(hello.java:167)" See: http://www.yacy-forum.de/viewtopic.php?p=9471 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@685 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4fd5b95b1f	*) Renaming Logger function names to reflect the proper Java Logging API Loglevels - please use logFine instead of logDebug - please use logSevere instead of logFailure and logError See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6adf8a4bde	*) Renaming Logger function names to reflect the proper Java Logging API Loglevels - please use logFine instead of logDebug - please use logFailure instead of logError See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a812fb86cc	*) Port Forwarding Feature does not detect broken connection properly. Therefor a test-request was added to the isConnected function to detect broken connections and to keep open connections alive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@596 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c47bb1182d	bugfix for assortment initialization error git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@547 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

1 2

89 Commits (ee010c36ae2367e9b9ee8dae3fb259858f4a9d95)