orbiter
02f8013013
auto-delete of corrupted word files during word-migration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1047 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
00ab4d8723
cleaned, small change, Properties
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1026 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
56b9f34411
*)removed unused imports
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4dcbc26ef1
introduction of search profiles; very experimental
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6260942590
changed search process: received indexes are now buffered and written to wordIndex after search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@934 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
10d3627c90
changed word cache flush scheduling and removed possible locks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@910 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
839db8869c
added high/low priority for index adding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
959eefbc4f
*) Robots.txt parser/ppt
...
cutting of comments at the line end
*) Adding Threadpool for stackCrawl Thread to speedup robots.txt download
and double url checks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@882 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a2fa75e688
*) Asynchronous queuing of crawl job URLs (stackCrawl)
...
various checks like the blacklist check or the robots.txt disallow check are now
done by a separate thread to unburden the indexer thread(s)
TODO: maybe we have to introduce a threadpool here if it turn out that this single
thread is a bottleneck because of the time consuming robots.txt downloads
*) improved index transfer
The index selection and transmission is done in parallel now to improve index
transfer performance.
TODO: maybe we could speed up performance by unsing multiple transmission threads in
parallel instead of only a single one.
*) gzip encoded post requests
it is now configureable if a gzip encoded post request should be send on
intex transfer/distribution
*) storage Peer (very experimentell and not optimized yet)
Now it's possible to send the result of the yacy indexer thread to a remote peer
istead of storing the indexed words locally.
This could be done by setting the property "storagePeerHash" in the yacy config file
- Please note that if the index transfer fails, the index ist stored locally.
- TODO: currently this index transfer is done by the indexer thread.
To seedup the indexer
a) this transmission should be done in parallel and
b) multiple chunks should be bundled and transfered together
*) general performance improvements
- better memory cleanup after http request processing has finished
- replacing some string concatenations with stringBuffers
- replacing BufferedInputStreams with serverByteBuffer
- replacing vectors with arraylists wherever possible
- replacing hashtables with hashmaps wherever possible
This was done because function calls to verctor or hashtable functions
take 3 time longer than calls to functions of arraylists or hashmaps.
TODO: we should take a look on the class serverObject which is inherited from hashmap
Do we realy need a synchronization for this class?
TODO: replace arraylists with linkedLists if random access to the list elements is not needed
*) Robots Parser supports if-modified-since downloads now
If the downloaded robots.txt file is older than 7 days the robots parser tries to
download the robots.txt with the if-modified-since header to avoid unnecessary downloads
if the file was not changed. Additionally the ETag header is used to detect changes.
*) Crawler: better handling of unsupported mimeTypes + FileExtension
*) Bugfix: plasmaWordIndexEntity was not closed correctly in
- query.java
- plasmaswitchboard.java
*) function minimizeUrlDB added to yacy.java
this function tests the current urlHashDB for unused urls
ATTENTION: please don't use this function at the moment because
it causes the wordIndexDB to flush all words into the
word directory!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b5a8992d29
*) Setting some object fields to final
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@796 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
495bc8bec6
removed cache-control from low and medium priority caches which reduces memory use and computation overhead
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@774 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fb52a82008
added new performance page for memory settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@751 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
4036ee812a
Updated german language file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@721 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
43b42854a0
fix for null-entries and http://www.yacy-forum.de/viewtopic.php?p=8649
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@699 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
4fd5b95b1f
*) Renaming Logger function names to reflect the proper Java Logging API Loglevels
...
- please use logFine instead of logDebug
- please use logSevere instead of logFailure and logError
See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
6adf8a4bde
*) Renaming Logger function names to reflect the proper Java Logging API Loglevels
...
- please use logFine instead of logDebug
- please use logFailure instead of logError
See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b33094e925
*) Trying to solve "Too many open files bug"
...
*) Temp.Bugfix for "Bug in Index Restore"
See: http://www.yacy-forum.de/viewtopic.php?p=8647#8647
Orbiter: Please take a look
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@602 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ba0a486328
moved printStackTrace() to logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@539 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3094045d34
fix for http://www.yacy-forum.de/viewtopic.php?p=7454#7454
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@536 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
cd10370992
several bugfixes and dht selection / logging improvement
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@531 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5716f8521d
bug fixes for word ordering and dht index selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@521 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
f5259f29e8
word cache behaviour fix and other fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@519 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
248c24b60a
intermission-feature usage in case of local and remote search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@510 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2d8557cb10
minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@487 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
b73557ed2d
better assortment monitoring and enhanced profile menue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@416 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1f36bf4dae
enhanced assortment capacity; added extended WORDS migration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@412 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
9c72b4cdec
replaced index dump stack by an dump array and limited url number in assortment ram (prevents too much RAM occupation)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@406 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
159f795f65
bugfix (null pointer exception in assortments)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@404 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1d2155675b
changed assortment memory cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@403 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2f0d7ea8d3
removed htcache stati (superfluous now)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@396 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
858cd94299
replaced indexing ram-queue by file-based stack-queue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@381 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
d6c85228a6
enhanced snippet computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@319 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1e7f062350
many bugfixes, memory leak fixes, performance enhancements; new kelondroHashtable; activated snippets
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@313 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
68dc2b0c6b
added kelondroArray, the basis for upcoming kelondroHash and some bug fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@311 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e3c92818db
avoiding OutOfMemoryError routines
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@302 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
a25b5b4986
fixed possible memory leak in htmlScraper: be aware that now links can get lost; further work necessary
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@288 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
fbbea813c5
*) changing references to logger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@248 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3771b10b89
implemented automated migration indexCache 0.37 -> indexAssortmentCluster
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@205 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
650ca3955a
added flush-thread for index cache and added language-name mapping in Language_p
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@203 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3d8a2ff937
enhanced parallelization of local/global/remote crawling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@197 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
a05d738ea4
enhanced caching, removed bug causing outOfMemory
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@195 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
21110dcd5e
fixed bugs with open files and caching
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@175 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5f90daa265
implemented localization environment
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@171 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
fdd606c8c8
fixed bugs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@168 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
0c35171c85
assortment fine-tuning
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@163 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
76dc892017
refined assortment
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@159 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
4b01ff7548
activated assortments, removed write-queues
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@151 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e26ac60c3e
modified assortment data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@148 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
79be6f003d
enhanced Assortment class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@141 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5c6147a54c
introduced assortment structure (generalization of singletons)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@139 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago