orbiter
497428c8ec
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bb7d4b5d5e
refactoring to prepare new RWI entry object
...
- moved all url and index(RWI) entries to index package
- better naming to distinguish RWI entries and URL entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
8385557672
Small fix for the Cache Monitor when using proxyCacheLayout=hash
...
see: http://www.yacy-forum.de/viewtopic.php?p=27394#27394
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2916 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
48f81acc0e
reverse SVN 2744, it is not needed
...
(this resulted from a small misunderstanding of the newest cache layout)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
1da9aece12
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a9c7e3f061
*) Bugfix for NoSuchElementException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e17fea7015
files in htcache are now stored in different hash/tree subdirectories
...
according to storage method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
25ae3d3161
generalized definition of hexhash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f0d747c723
removed deprecated method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0f10bdde22
more generic cache methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
440c6ee657
Implement alternative htcache layout
...
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f17ce28b6d
*) plasmaHTCache:
...
- method loadResourceContent defined as deprecated.
Please do not use this function to avoid OutOfMemory Exceptions
when loading large files
- new function getResourceContentStream to get an inputstream of a cache file
- new function getResourceContentLength to get the size of a cached file
*) httpc.java:
- Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
- new option to hold loaded resource content in memory
- adding option to use the worker class without the worker pool
(needed by the snippet fetcher)
*) plasmaSnippetCache
- snippet loader does not use a crawl-worker from pool but uses
a newly created instance to avoid blocking by normal crawling
activity.
- now operates on streams instead of byte arrays to avoid OutOfMemory
Exceptions when operating on large files
- snippet loader now forces the crawl-worker to keep the loaded
resource in memory to avoid IO
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
- keep resource in memory whenever possible (to avoid IO)
- when parsing from stream the content length must be passed to the parser function now.
this length value is needed by the parsers to decide if the parsed resource content is to large
to hold it in memory and must be stored to file
- AbstractParser.java: new function to pass the contentLength of a resource to the parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
df1629b05a
- code cleanup
...
- version 0.471
- moved surftipps to own web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
625c2ce6b1
*) bugfix for snippet fetching problem if content but not http header is available in cache
...
See: http://www.yacy-forum.de/viewtopic.php?p=25748
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
1dc12d6659
*) Bugfix for shutdown problem caused by cacheScan thread
...
See: http://www.yacy-forum.de/viewtopic.php?p=25729
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b251076e64
avoid ConcurrentModificationException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
dae763d8e3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ffbf416e76
*) direct access to requestheader of htCache.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
393a7d10be
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ab5a9bee66
*) adding some copyright headers
...
*) next step of restructuring for new crawlers
- adding first testversion of ftp crawler class
-- does not create a htCache entry yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
db1eae0227
* simplified initialization of database objects
...
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
0b73f2b132
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
23dd972608
fixed memory calculation in performanceMemory web page
...
fixed also maximum cache size computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
abf22f6e60
removed url normalform computation from htmlFilterContentScraper.
...
This method was implemented in de.anomic.net.URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3879a0ecd0
replaced java.net.URL usage by use of new class de.anomic.net.URL
...
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
92f4cb4d73
added option to configure the start-up delay time for kelondro database files.
...
the start-up delay is used to pre-load the database node cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
66964dc015
removed high/med/low from kelondroRecords cache control.
...
this was done because testing showed that cache-delete operations
slowed down record access most, even more that actual IO operations.
Cache-delete operations appeared when entries were shifted from low-priority
positions to high-priority positions. During a fill of x entries to a database,
x/2 delete situation happen which caused two or more delete operations.
removing the cache control means that these delete operations are not
necessary any more, but it is more difficult to decide which cache elements
shall be removed in case that the cache is full. There is not yet a stable
solution for this case, but the advantage of a faster cache is more important
that the flush problem.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2244 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4d8f8ba384
added cache-performance analysis for node caches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2140 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
90d569d70f
refactoring of index management:
...
url storage is part of index management; moved plasmaURL to indexURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a474669338
start with refactoring of index management
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
55ea4cbfe6
*)reverted patch for memory-display issue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2095 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
53d9ab6db7
*)fixed bug in PerformanceMemory_p.java which caused negative memory-values on big peers
...
see http://www.yacy-forum.de/viewtopic.php?t=2370
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2091 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
29b1b0823c
added monitoring of new object cache to performanceMemory page
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2072 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
30e4fc39a5
HTCache extended
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2015 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
da6a8bafa2
rename currCacheSize -> curCacheSize;
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2010 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
128e4ab199
- in serverSystem: maxPathLength is now a variable, not a method
...
- upon startup the calculated maximum path length is shown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1932 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
30e3e3a0fd
adopted MAXPATHLENGTH to host system capabilities
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1930 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
8865948e4e
Cleanup;
...
Methode replaceRegex added;
Constant MAXPATHLENGTH added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1923 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
5f6fdf1786
Bugfix for getCachePath(URL url)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1909 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
759800f543
*) Bugfix for storeHTCache problem
...
- content was not indexed if storeHTCache was off
See: http://www.yacy-forum.de/viewtopic.php?p=18269
See: http://www.yacy-forum.de/viewtopic.php?t=1882
See: http://www.yacy-forum.de/viewtopic.php?t=241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
139ba4e0c8
Bugfix for getCachePath(URL url)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1510 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3419b3bcdd
fix for bug that caused the peer-counter problem.
...
See http://www.yacy-forum.de/viewtopic.php?p=16016#16016
The kelondroDyn now uses a generic fill character.
kelondroDyn-Tables containing peer/word/url-hashes must not use '_'
as fill character.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1498 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
6c02f889f7
Cosmetic changes.
...
Corrected version numbering as described in http://www.yacy-websuche.de/wiki/index.php/De:Versionsnummern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1453 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
861aae678d
*) cleanup cacheAge database when cleaning up the HTCache
...
*) Log directory deletes with level Fine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1427 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ecdc1f7547
*) Bugfix for crawling URLs with query parameters
...
See: http://www.yacy-forum.de/viewtopic.php?p=14065
*) Preparation for http://www.yacy-forum.de/viewtopic.php?t=1719
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1405 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
7c22afe3de
*) Bugfix for NullpointerException in deleteOldHTCache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1326 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
9d8dca750e
BUGFIX for my last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1306 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
5449193167
bugfix for http://www.yacy-forum.de/viewtopic.php?t=1706 (i hope)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1304 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
2a23f5d419
F..., Sorry, no time, later
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1303 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
3a2d13786e
bugfix for http://www.yacy-forum.de/viewtopic.php?t=1706
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1302 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago