orbiter
c89d8142bb
replaced old 'kCache' by a full-controlled cache
...
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6e2907135a
bugfixes for remote search server part
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
cf9884e22b
first attempt to implement a secondary search
...
this is a set of search processes that shall enrich search results
with specialized requests to realize a combination of search results
from different peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
75b198bc02
- updated references to indexContainer
...
- more bugfixes and debugging for indexAbstract processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4f9e42d5ed
more changes towards better join-search
...
- fixed problems with index-abstract generation
- added analysis output for index abstract receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
82a6054275
- fixed bug with new indexAbstract generation
...
- added partly evaluation of indexAbstracts during remote searches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
74d1dea30b
changes towards better join-search
...
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c543028dd4
fixed double/missing null check for LURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2520 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
96c6e4e322
- enhancements to detailed search page
...
- enhancements to search ranking computation process
- removed bugs in postranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
9340dbb501
fixed all possible problems with nullpointer exception for LURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
ff4362b02d
some more fixes for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4866868c0e
added write cache for LURLs
...
This was necessary to speed up the index receive process during global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8a0e35618b
enhancements to search result preparation
...
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
f3ac4dbbb9
*) better handling of server shutdown
...
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
18b6876860
new cache flush configuration settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6ad471ef96
* applied many compiler warning recommendations
...
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5e0b6f8f83
*) sorting peer name list on Blacklist_p.html
...
*) restructuring of sharedBlacklist_p.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2405 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
6c8366aea1
*) Bugfix for blacklist import function
...
- wrong property name
- list was accidentally imported into a new blacklist file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2404 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
eee44be602
*) adding an interface for customized blacklist classes
...
- now it's possible to use a customized blacklist engine
instead of the default one
- this can be done by configuring the property BlackLists.class
See: http://www.yacy-forum.de/viewtopic.php?t=2108
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
66f1eb07d9
*) Bugfix for IllegalArgumentException in transferURL
...
See: http://www.yacy-forum.de/viewtopic.php?p=24560
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2391 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d2e8e76218
*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
...
See: http://www.yacy-forum.de/viewtopic.php?t=2541
http://www.yacy-forum.de/viewtopic.php?p=24516
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f43c90fa98
fixed handling of null referer in crawlOrder
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
abf22f6e60
removed url normalform computation from htmlFilterContentScraper.
...
This method was implemented in de.anomic.net.URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ec5149ff3b
fix for busyCacheFlush detection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2365 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f58283def2
better control of index flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
80b6c90d54
enhancements to prevent blocking during dht transfer receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
d56f06401e
- Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
...
- Small logging updates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
c7b6389ca1
*) renaming indexDistribution.dhtReceiptLimitEnabled property to indexDistribution.transferRWIReceiptLimitEnabled
...
so that the default value is taken over by all peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2356 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9183d21f25
renamed new index class to old name
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c4e922885a
replaced indexURLEntry by new class that uses a kelondroRow.Entry object
...
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5f72be2a95
some redesign of EURL storage
...
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
58df8b7bbf
a large collection of different changes
...
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
8ba8e2b7d9
*) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
53cbcc6d6e
Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit
...
See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b20496e42b
*) make DHT DoS check configurable (requested by KoH)
...
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
- upper bound can be configured via indexDistribution.dhtReceiptLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
38a1410361
Don't test a remote peer's seed during hello.respond as its IP might not be proper, especially while still virgin
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2187 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5041d330ce
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
90d569d70f
refactoring of index management:
...
url storage is part of index management; moved plasmaURL to indexURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a930be4ba3
refactoring of index management:
...
generalized the index entry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7dd57a3828
added a busy-time estimation at DHT/RWI-Receive
...
to be done: usage of this value on client-side
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2116 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
fcec40fcc6
*) don't accept messages without subject or payload
...
See: http://www.yacy-forum.de/viewtopic.php?p=21656
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2115 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
82b2bc6932
patch for index-transfer DoS problem
...
see http://www.yacy-forum.de/viewtopic.php?p=21627#21627
note that this function will make the index-transfer functionality void
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a474669338
start with refactoring of index management
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
799c04091d
Bugfix for Spam-Bug (Header manipulation)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2057 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
dbe96e6541
added hand-over of search filter and prefer ranking to yacy protocol
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
00a5d435e2
- fixed some bugs with domain filter
...
- added new ranking filter "prefermask": urls that match the filter are ranked better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bd283b8443
fixed bugs:
...
- null pointer exception during startup of a robinson-configured peer
- wrong time calculation of default value of re-crawl option
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0a4c2e89ed
remote crawl orders are now only accepted if sum over all
...
queues is less than 100 (the indexing queue was not measured before)
see also: http://www.yacy-forum.de/viewtopic.php?p=19374#19374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1947 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1f4412a146
adopted isListed to discussed new behavior as discussed (url, getFile)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3286b1f498
re-organisation of lurl-creation and -stacking
...
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago