orbiter
6dc42a2392
detecting of loops in kelondroTree during last/first-Node search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1038 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5b0911d7ea
added new performance menu for search sequence configuration and monitoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@990 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4fa942511b
de-serialized read and write access
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@989 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1ff0ced515
integration of an interface class for abstract access of kelondro indexed structures like kelondroTree and kelondroHashtable
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@987 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8d827cdb30
tried to fix problems with order of network list by last-seen (which could also improve the network picture)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@980 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b7e21ec107
*) Adding DB import function which allows to import an foreign yacy DB (from directory PLASMADB)
...
into the DB of an other peer.
ATTENTION: not tested very well. please use this with care and always make a db backup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@932 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
959eefbc4f
*) Robots.txt parser/ppt
...
cutting of comments at the line end
*) Adding Threadpool for stackCrawl Thread to speedup robots.txt download
and double url checks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@882 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
4191b21e73
cleaned, finals, Properties
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@858 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a2fa75e688
*) Asynchronous queuing of crawl job URLs (stackCrawl)
...
various checks like the blacklist check or the robots.txt disallow check are now
done by a separate thread to unburden the indexer thread(s)
TODO: maybe we have to introduce a threadpool here if it turn out that this single
thread is a bottleneck because of the time consuming robots.txt downloads
*) improved index transfer
The index selection and transmission is done in parallel now to improve index
transfer performance.
TODO: maybe we could speed up performance by unsing multiple transmission threads in
parallel instead of only a single one.
*) gzip encoded post requests
it is now configureable if a gzip encoded post request should be send on
intex transfer/distribution
*) storage Peer (very experimentell and not optimized yet)
Now it's possible to send the result of the yacy indexer thread to a remote peer
istead of storing the indexed words locally.
This could be done by setting the property "storagePeerHash" in the yacy config file
- Please note that if the index transfer fails, the index ist stored locally.
- TODO: currently this index transfer is done by the indexer thread.
To seedup the indexer
a) this transmission should be done in parallel and
b) multiple chunks should be bundled and transfered together
*) general performance improvements
- better memory cleanup after http request processing has finished
- replacing some string concatenations with stringBuffers
- replacing BufferedInputStreams with serverByteBuffer
- replacing vectors with arraylists wherever possible
- replacing hashtables with hashmaps wherever possible
This was done because function calls to verctor or hashtable functions
take 3 time longer than calls to functions of arraylists or hashmaps.
TODO: we should take a look on the class serverObject which is inherited from hashmap
Do we realy need a synchronization for this class?
TODO: replace arraylists with linkedLists if random access to the list elements is not needed
*) Robots Parser supports if-modified-since downloads now
If the downloaded robots.txt file is older than 7 days the robots parser tries to
download the robots.txt with the if-modified-since header to avoid unnecessary downloads
if the file was not changed. Additionally the ETag header is used to detect changes.
*) Crawler: better handling of unsupported mimeTypes + FileExtension
*) Bugfix: plasmaWordIndexEntity was not closed correctly in
- query.java
- plasmaswitchboard.java
*) function minimizeUrlDB added to yacy.java
this function tests the current urlHashDB for unused urls
ATTENTION: please don't use this function at the moment because
it causes the wordIndexDB to flush all words into the
word directory!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f6a0e0f162
small bugfix to readFully
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@851 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6d5d0ac801
bugfix for startup problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@850 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5dc0d41900
bugfix in kelondroRA (hint by Martin)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@847 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
25a59a51ad
fixed problem created with last svn commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@810 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
72ce36baba
cleanup in kelondroRecords
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@787 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e380d4e55e
cleanup (no functional change)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@778 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
de0a58d79c
no more sync
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@776 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
495bc8bec6
removed cache-control from low and medium priority caches which reduces memory use and computation overhead
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@774 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
979a3ee3c0
exceptions for better testing of bug http://www.yacy-forum.de/viewtopic.php?p=9852#9852
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@769 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
18d9e1a256
fix for http://www.yacy-forum.de/viewtopic.php?p=10026#10026
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@768 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ef85fce661
change of memory-consumption constants (had been much too low)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@764 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
6d1de8abfd
finals; cleaned;
...
Properties;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@756 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fb52a82008
added new performance page for memory settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@751 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e6b30911c3
small changes to caching
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@747 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
900ab97422
change of memory-allocation blocking value for GC prevention
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@740 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0ffca99886
added priority-organization to kelondroRecord cache. This should virtually double the cache capacity.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@738 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
2d22626386
automatic switch-off of cache control in kelondroRecords in case that cache is big enough (so that no cache-aging needs to be controled)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@737 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cddd9aaa33
fixed SERIOUS bug with kelondroStack; affected all stack processing since 729
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@732 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
19547f1821
changed node manipulation methods in kelondro core to reduce object allocation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@729 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
debb207a74
removed file sync
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@725 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
2148c0cf49
replaced kelondro storage core; much less objects in kelondro cache now; less IO from DB
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@724 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
43b42854a0
fix for null-entries and http://www.yacy-forum.de/viewtopic.php?p=8649
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@699 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
bead8a32aa
*) IndexCreate_p.java:
...
Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
*) kelondroStack.java, plasmaSwitchboardQueue.java
Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java
instead of an iterator to display the indexing-list.
Advantages: avoid concurrent modifications of the list while displaying it.
Speedup because now we have to access only one sync function instead of multiple ones
(one for each entry)
*) IndexCreateIndexingQueue_p.java
Using new list() function of plasmaSwitchboardQueue
*) httpdFileHandler.java
If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of
the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is
used insead of a post request, but a refresh should not be allowed.
*) IndexCreateWWWLocalQueue_p.html
Now it's possible to delete single entries of the local crawler queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
6adf8a4bde
*) Renaming Logger function names to reflect the proper Java Logging API Loglevels
...
- please use logFine instead of logDebug
- please use logFailure instead of logError
See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b33094e925
*) Trying to solve "Too many open files bug"
...
*) Temp.Bugfix for "Bug in Index Restore"
See: http://www.yacy-forum.de/viewtopic.php?p=8647#8647
Orbiter: Please take a look
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@602 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b67f008eb8
*) Trying to solve "Too many open files bug"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@601 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5716f8521d
bug fixes for word ordering and dht index selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@521 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
c796a38424
doc update
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@515 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
40da910f41
bugfixes and automatic news-cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@481 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e84a177c49
many bigfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@475 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5672709ef3
several bugfixes for YaCyNews
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@465 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1022fbeb65
many YaCyNews fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@461 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
ad90f0ad13
activated RWI distribution to DHT for senior peers (default redundancy 3), necessary now for network growth
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@438 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3470a72d48
fixed div by zero, set default delays, fixed release number format and display
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@435 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
0f663bcebf
added global ppm computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@407 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1d2155675b
changed assortment memory cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@403 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
19dbed7cc8
code clean-up
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@401 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
311e627363
blocking of blacklisted urls in indexReceive and small changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@397 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
858cd94299
replaced indexing ram-queue by file-based stack-queue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@381 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3addf58046
enhanced snippet-loading with threads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@322 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
4afcf10158
added kelondroHashtable (not finished yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@321 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1e7f062350
many bugfixes, memory leak fixes, performance enhancements; new kelondroHashtable; activated snippets
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@313 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
68dc2b0c6b
added kelondroArray, the basis for upcoming kelondroHash and some bug fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@311 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5728cdf8b5
bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@308 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
a19541e563
code-enhancements after analysis with AppPerfect
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@307 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
85075269a6
extended fail-safe memory-managament. prevents too much allocation, too often GC and should help for the 100%CPU-bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@303 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e3c92818db
avoiding OutOfMemoryError routines
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@302 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3e8ee5a46d
enhanced caching in kelondroRecords and added better synchronization/finalizer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@301 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3be98f194d
tried to find the socket bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@300 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
9e47ba5ad6
*) adding missing calls for function close() to avoid "too many open file" bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@282 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
451ca6b818
*) changing reference to logger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@247 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e89ded9e41
bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@204 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
650ca3955a
added flush-thread for index cache and added language-name mapping in Language_p
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@203 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
f8f8dd05db
fixed "Too many open files" - bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@174 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
4b01ff7548
activated assortments, removed write-queues
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@151 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e26ac60c3e
modified assortment data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@148 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
287d2e6f10
further enhanced caching (new cache flush methods)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@111 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
ea478f3975
enhanced indexing-caching
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@107 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
a5fec449c8
*) setting threadnames for kelondroMap:writequeue and publishSeed
...
so that a thread dump is more verbose
*) Moving code for transparent proxy support to a separate function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@98 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
db1da3345d
introduced singleton-database
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@92 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
a9b22647dc
fixed bug in indexDump.stack - generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@88 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1d7fed87dc
redesign of index caching - removed indexCache.db
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@86 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
2aa5fe8f50
*) Import statements reorganized
...
Now it's easier to determine which class really uses which other class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@82 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
c7c6aaf06e
many bug-fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@73 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
995673d795
several bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@71 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2de90020ed
fixed caching+synchronization+brute-force-denial
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@67 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
47e426ff7e
*) one possible deadlock (because of nested object locks) removed in class kelondroMap
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@61 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
ba16da72b4
fixed not-working kelondroRecords-Cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@56 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
7fb645b0ab
enhanced crawling performance, changed memory settings, new performace options
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@51 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
8b31f9e202
enhanced shut-down behaviour & added experimental nio-wrapper for kelondroRA (not active yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@44 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
96516fc9d8
fixed bugs (search+kelondroException, dns)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@16 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e374aca2cd
enhanced exception handling in kelondro
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@14 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
072052f150
fixed bugs (dns, seedDB)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@13 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
248077d3f0
initial load with yacy 0.36
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago