yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	d3431433b0	more anonymization in logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2876 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	78b7f6f7fd	bugfix for index remove bug, appeared after search where snippet-loading triggered word removal git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	06854988da	- full integration of new LURL database in INDEX - added migration method for urlHash.db into INDEX git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	52466067d8	*) Bugfix for ArrayIndexOutOfBoundsExceptions which occure because SimpleDateFormat is not thread-safe See: http://www.yacy-forum.de/viewtopic.php?t=2995 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2810 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b79e06615d	- added new LURL.Entry class for next database migration - refactoring of affected classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	77a59a115d	refactoring of indexing methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a5dd0d41af	- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	afbb547f3d	extended options for abstracts generation in remote search interface git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2739 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2e4aa6a170	refactoring of Advanced Config: - removed settings that are in Basic Settings - joined pages that belong together - moved include pages from yacy/ to / git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2726 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	dbc2e039bb	added time-out option parameter to call hierarchy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	00746ca232	identified and fixed search performance problem caused by snippet loading. Some access to header-db had been twice and even more times in some cases. Snippet resource loading fixed. Furthermore the snippet loading during remote search within the remote peer has been disabled, but can be switched on remotely by new flag 'includesnippet=true' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	3aac5b26da	- added automatic tag generation when a web page from the search results is added - added new image 'B' in front of search results for bookmark generation - added news generation when a public bookmark is added - the '+' in front of search results has new meaning: positive rating for that result - added news generation when a '+' is hit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e03740c306	small fix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2575 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c89d8142bb	replaced old 'kCache' by a full-controlled cache there are now two full-controlled caches for incoming indexes: - dhtIn - dhtOut during indexing, all indexes that shall not be transported to remote peers because they belong to the own peer are stored to dhtIn. It is furthermore ensured that received indexes are not again transmitted to other peers directly. They may, however be transmitted later if the network grows. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6e2907135a	bugfixes for remote search server part git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	cf9884e22b	first attempt to implement a secondary search this is a set of search processes that shall enrich search results with specialized requests to realize a combination of search results from different peers. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	75b198bc02	- updated references to indexContainer - more bugfixes and debugging for indexAbstract processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4f9e42d5ed	more changes towards better join-search - fixed problems with index-abstract generation - added analysis output for index abstract receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	82a6054275	- fixed bug with new indexAbstract generation - added partly evaluation of indexAbstracts during remote searches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	74d1dea30b	changes towards better join-search - added generation of a compressed index within remote peers during global search - added selection of specific urls within remote peers during secondary global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c543028dd4	fixed double/missing null check for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2520 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	96c6e4e322	- enhancements to detailed search page - enhancements to search ranking computation process - removed bugs in postranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	ff4362b02d	some more fixes for new plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8a0e35618b	enhancements to search result preparation - added detailed count on remote search results - enhanced search sequence during remote searches (doing local search in sequence) - strict adherence to timout limits git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f3ac4dbbb9	*) better handling of server shutdown See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	18b6876860	new cache flush configuration settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6ad471ef96	* applied many compiler warning recommendations * cleaned up code * added unit test code * migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5e0b6f8f83	) sorting peer name list on Blacklist_p.html ) restructuring of sharedBlacklist_p.java git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2405 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6c8366aea1	*) Bugfix for blacklist import function - wrong property name - list was accidentally imported into a new blacklist file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2404 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	eee44be602	*) adding an interface for customized blacklist classes - now it's possible to use a customized blacklist engine instead of the default one - this can be done by configuring the property BlackLists.class See: http://www.yacy-forum.de/viewtopic.php?t=2108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	66f1eb07d9	*) Bugfix for IllegalArgumentException in transferURL See: http://www.yacy-forum.de/viewtopic.php?p=24560 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2391 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d2e8e76218	*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler See: http://www.yacy-forum.de/viewtopic.php?t=2541 http://www.yacy-forum.de/viewtopic.php?p=24516 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f43c90fa98	fixed handling of null referer in crawlOrder git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	abf22f6e60	removed url normalform computation from htmlFilterContentScraper. This method was implemented in de.anomic.net.URL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ec5149ff3b	fix for busyCacheFlush detection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2365 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f58283def2	better control of index flush git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	80b6c90d54	enhancements to prevent blocking during dht transfer receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	d56f06401e	- Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible - Small logging updates git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c7b6389ca1	*) renaming indexDistribution.dhtReceiptLimitEnabled property to indexDistribution.transferRWIReceiptLimitEnabled so that the default value is taken over by all peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2356 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9183d21f25	renamed new index class to old name git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c4e922885a	replaced indexURLEntry by new class that uses a kelondroRow.Entry object to store the index entry. This is another step to move to the new database structure. A side effect of this change is, that index storage uses much less RAM space, which affects the index RAM cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5f72be2a95	some redesign of EURL storage * store() is now called explicitely * more urls are written to the EURL table * the EURL stack does not store the complete entry any more, now only the URL hash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	58df8b7bbf	a large collection of different changes * mainly for the transition to the new indexing database structure * a bugfix for an endless loop inside kelondroTree iteration * a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice * very strong speed enhancement for url/domain extraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	8ba8e2b7d9	*) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	53cbcc6d6e	Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b20496e42b	*) make DHT DoS check configurable (requested by KoH) - check can be disabled via property indexDistribution.dhtReceiptLimitEnabled - upper bound can be configured via indexDistribution.dhtReceiptLimit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	38a1410361	Don't test a remote peer's seed during hello.respond as its IP might not be proper, especially while still virgin git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2187 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5041d330ce	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	90d569d70f	refactoring of index management: url storage is part of index management; moved plasmaURL to indexURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a930be4ba3	refactoring of index management: generalized the index entry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7dd57a3828	added a busy-time estimation at DHT/RWI-Receive to be done: usage of this value on client-side git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2116 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fcec40fcc6	*) don't accept messages without subject or payload See: http://www.yacy-forum.de/viewtopic.php?p=21656 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2115 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	82b2bc6932	patch for index-transfer DoS problem see http://www.yacy-forum.de/viewtopic.php?p=21627#21627 note that this function will make the index-transfer functionality void git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a474669338	start with refactoring of index management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	799c04091d	Bugfix for Spam-Bug (Header manipulation) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2057 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	dbe96e6541	added hand-over of search filter and prefer ranking to yacy protocol git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	00a5d435e2	- fixed some bugs with domain filter - added new ranking filter "prefermask": urls that match the filter are ranked better git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bd283b8443	fixed bugs: - null pointer exception during startup of a robinson-configured peer - wrong time calculation of default value of re-crawl option git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0a4c2e89ed	remote crawl orders are now only accepted if sum over all queues is less than 100 (the indexing queue was not measured before) see also: http://www.yacy-forum.de/viewtopic.php?p=19374#19374 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1947 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1f4412a146	adopted isListed to discussed new behavior as discussed (url, getFile) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3286b1f498	re-organisation of lurl-creation and -stacking this was necessary to prevent useless write to the database in case of blacklist appearance of the url git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	289da326e5	*) Bugfix: remove blacklisted URL from loadedURL, when received via DHT transfer see: http://www.yacy-forum.de/viewtopic.php?p=18976#18976 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1904 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	9f979d4fa5	Domain-lists gzip-compressable and sendable via cr-send/receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1883 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f188611fc6	apply blacklist on rwis during dht receive very experimental! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1865 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5ee0125046	*) adding possibility to configure the server port for seed uploading via scp. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1861 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	7afa5c1b8e	staticIP fix tried to solve http://www.yacy-forum.de/viewtopic.php?p=18663#18663 D 2006/03/08 07:08:20 YACY yacyClient.publishMySeed mySeed error - not proper: IP is not proper: -UNRESOLVED_PATTERN- git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1859 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	f108048a2c	) Bugfix for NullpointerException in hello.java ) Correcting for loop in hello.java git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1854 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bae3783d38	added a snippet marking (search words are now bold in snippets) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	f73d51f94b	reverted last change git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1810 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	8997b83806	store the staticIP(dyndns) in seed, not the real IP git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1809 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	7c5f8f997a	some more staticIP fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1784 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d31a4e0b4f	some small enhancements with cache flushing parameters and data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1767 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	3208fe14ed	*) log exceptions in crawlOrder.java to the logfile instead of stdout git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1735 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7eb10675b3	re-organization of index management this was done to be prepared for new storage algorithms git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d0f76fc9bc	) setting logging level for thread pools to info ) new layout for bookmark list (Allo: please take a look if it's acceptable for you) ) crawlReceipt.java: displaying peer name in logging message ) Network.html: adding button for manual peer ping git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1584 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	fb7411d7bb	re-structuring of ranking application: concentration of all ranking attributes in the plasmaSearchRankingProfile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1541 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d98418390b	- introduced rankingProfile Class - selection of ranking and timing profiles for each search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	1f3eaf9f8e	use DATA/HTDOCS for notifier.gif. Works even if htroot is readonly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1526 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	fa90c3ca7a	- removed some usage of indexEntity - changed index collection process: indexes are not first flushed to indexEntity, but now collected directly from ram cache and assortments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	03c65742ba	changes towards the new index storage scheme: - replaced usage of temporary IndexEntity by EntryContainer - added more attributes to word index - added exact-string search (using quotes in query) - disabled writing into WORDS during search; EntryContainers are used instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	5942f6334c	Some language fixes. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1386 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f4ffa9aee5	- implemented more attributes to index entries - implemented hand-over of new word index attributes during remote search - implemented word-distance computation during search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	c5b6154136	added CRDistOn = true/false git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1372 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	8d8a40c2d9	added properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1369 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cfd1e5e376	more security for index transfer protocol: - allow only specific file names - log IP number of accessing peer in case of attack attempts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1367 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	423ce9bf59	quickfix for http://www.yacy-forum.de/viewtopic.php?p=15336#15336 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1366 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	5eba6c66c6	thelis fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1364 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
rramthun	c59027e520	Translated status_p.inc a bit further, but it didn't work. See http://www.yacy-forum.de/viewtopic.php?p=15180#15180 Added my seed to superseed.txt as I am now proud owner of a PC which runs YaCy most of the day. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1343 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9544c47684	added some UTF-8 handling. hope this will help somehow.. for shure not THE solution to our UTF-8 problem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9086261476	refactoring of base64 encoding: the kelondro database needs specific information about the order of base64-encoded keys. Since no other package depends on base64 (only the httpd uses base64 for encryption, but does not need to encode these strings) it is good to move base64 encoding to the new ordering classes in kelondro. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b3dca06bb1	added location column to network pages. The location is computed from the userAgent string of connecting peers. Therefore this information is not available right after start-up. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1241 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bb79fb5d91	- changed handling of error cases retrieving urls from database (no more NULL values are returned, instead, an IOException is thrown) - removed ugly damagedURLS implementation from plasmaCrawlLURL.java (this inserted a static value into the Object which is not really a good style) - re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message to do: - the urldbcleanup feature must be re-tested git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	37f88b4017	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f1f2daa5e	implemented interactive link deletion of search results. next steps: attach voting and restrict to administrator to see the deletion button, move the mouse pointer to the left of a search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7920e1547d	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1163 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1d6a6d1f85	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	b9cc9029e3	added ybr selection for remote search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	89fab9f200	*) Correcting Problems with lURLEntries containing null URLs. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1104 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	23dc904e0e	*) Correcting Problems with lURLEntries containing null URLs. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1102 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	0610ff4fe9	*) small changes to crawlReceipt.java - we do not know if the URL was stored in the noticeURL-DB with the old or new hash. therefore we now try to remove the URL from the noticeURL-DB using both hash values git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1082 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e9d6defce6	qquickfix for http://www.yacy-forum.de/viewtopic.php?p=12638#12638 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1073 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f763923e0a	added missing files for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d2731418bf	added creation of global ranking files and changed url normal form usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	fb766413d1	*) Changes on httpc dns caching - Bugfix: old dns cache did not handle case insensitive hostnames correctly. - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache e.g. borg-300.dyndns.org This can be done by setting the new httpc.nameCacheNoCachingPatterns property - using httpc.dnsResolve wherever possible within the sourcecode [httpd.java,plasmaCrawlStacker.java] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	440e6ed747	see http://www.yacy-forum.de/viewtopic.php?t=1416 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1025 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b8ceb1ffde	) Adding better https support for crawler - solving problems with unkown certificates by implementing a dummy trust Manager - adding https support to robots-parser - Seed File can now be downloaded from https resources - adapting plasmaHTCache.java to support https URLs properly ) URL Normalization - sub URLs are now normalized properly during indexing - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function - normalizing URLs which were received by a crawlOrder request git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	f871408729	*) sharedBlacklist_p.java - Setting Pragma: no-cache - increasing timeout to 12 sec. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1019 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8194fde340	*) trying to continue transferRWI processing even if this error occures: \|> Caused by: de.anomic.kelondro.kelondroException: kelondroTree.searchproc: nullpointernull in db '.../urlHash.db' - if URL existence can not be determined, we request it from the remote peer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@997 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4dcbc26ef1	introduction of search profiles; very experimental git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	7256bea45f	) Bugfix for nameLookup parameter handling ) Bugfix for Received xx Words [xxxxxxx .. null] Bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@953 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	40777556c5	) Connection Tracking - adding automatic refresh - accepts new parameter nameLookup which can be used to deactivate yacy-peer name lookup (because we have problems with this on large seed-dbs) ) ViewFile New page that can be used to view - original content - plain text content - parsed content - parsed sentences of a webpage specified by there url hash Mainly for debugging purpose at the moment ) Robots.txt Bugfix for if-modified-since usage TODO: synchronization of downloads to avoid loading the same robots-file multiple times in parallel by different threads ) Shutdown Better abortion of transferRWI and transferURL sessions on server shutdown *) Status Page Adding icon to start/stop crawling via status page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	e642a5d8b7	more constants git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	4a8e6d552e	invokation with "emailaddress" in Parameter. (compatible with other programs than sendmail, like sendxmpp) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@929 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	d29dfb0a12	refactoring of search / preparation for better search methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	839db8869c	added high/low priority for index adding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a1777788a5	small change git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@879 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	64acb46a91	cleaned, finals, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@857 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	52168fab9b	cleaned, finals, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@856 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	a9c466ef21	cleaned, finals, StringBuffer, Properties git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@849 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0c3a20d44f	more + changed log for better understanding of outOfMemory bug and others git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fc822a59b	changed handling of time-zones git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
(no author)	1aa79f5bb5	cleaned; Properties; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@790 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	c42a543bc3	*) Adding peername to logmessage when receiving URLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@781 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	1dc94e7753	) Adding support for gzip content-encoding of http post requests used to transferRWIs and transferURLs. See: http://www.yacy-forum.de/viewtopic.php?t=1167#10020 ) adding yacyVersion.java containing constants defining yacy versions that support a given feature. Needed to determine if a remote peer is able to decode gzip content-encoded http post bodies properly. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@772 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	96a5b6e8fb	removed yacy peer types from serverSwitch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@758 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	11e175630b	StringBuffers, finals; cleaned; Properties; git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@745 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fec3bb1c	*) Bugfix for " java.lang.NullPointerException at hello.respond(hello.java:167)" See: http://www.yacy-forum.de/viewtopic.php?p=9471 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@685 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4fd5b95b1f	*) Renaming Logger function names to reflect the proper Java Logging API Loglevels - please use logFine instead of logDebug - please use logSevere instead of logFailure and logError See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6adf8a4bde	*) Renaming Logger function names to reflect the proper Java Logging API Loglevels - please use logFine instead of logDebug - please use logFailure instead of logError See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a812fb86cc	*) Port Forwarding Feature does not detect broken connection properly. Therefor a test-request was added to the isConnected function to detect broken connections and to keep open connections alive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@596 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c47bb1182d	bugfix for assortment initialization error git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@547 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	25f632dbd9	more DHT bugfixes and better logging of DHT effects git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@542 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	cd10370992	several bugfixes and dht selection / logging improvement git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@531 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
theli	865b9490a2	*) Making DHT Transfer while Crawling configurable See: http://www.yacy-forum.de/viewtopic.php?p=6904 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@496 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
theli	0610e83468	*) Bugfix. recipient peer was accidentally displayed as source peer of a url transmission. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@495 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	bb3e897baf	mor minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@488 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	2d8557cb10	minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@487 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
rramthun	eacff63eda	Typos... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@482 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	083c8ddc69	new alert symbols git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@473 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	e24dbde217	better logging for WRONG seed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@463 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
rramthun	b99205e445	Translation, spelling... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@448 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	a2cf76ea7c	bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@413 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
rramthun	0f11399d16	Some corrections... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@409 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	9f505af7aa	preparations for bulk remote crawls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@408 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	19dbed7cc8	code clean-up git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@401 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

1 2 3 4 5 ...

275 Commits (1f199d688bfdd788e56919f2bc3a73db87d559df)