orbiter
2c4a672fe2
bugfixes and performance hacks for tabe index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dad5b586a4
added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7956 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
734059d33e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
23e81b28b2
synchronization enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dd4635e323
patches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7953 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85a5487d6d
YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0819e1d397
protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2cba860693
- fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
...
- patch for better urls to solr admin interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2842ce30d6
added synchronization in ReferenceContainer and logging for shrinking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7937 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cec3836e73
added reference limitation to IndexControlRWIs_p.html servlet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7936 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
ecb4986b38
refactored stuff from last commit to ReferenceContainer
...
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
f7c4abfdd7
limit references per blob & term to the 100.000 youngest
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7934 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
28f5b79deb
added a fast mass-deletion method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7933 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a70dbce41c
added another file tool class to yacy-cora
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7932 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
49e5ca579f
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e02bfbde56
fix for solr url
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7930 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
580beb12a5
reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7928 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
44d6416e2d
ensure termination of shrink()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7927 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
52230a6864
replaced catching of Exception with Throwable, which catches also Errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7926 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
877eaf6bcb
switched off logging of org.apache.http which was suddenly switched on by default (??)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7925 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e1a3d609aa
moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
...
- some refactoring for mime type discovery
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3da21c4266
protection against starting of a (second) yacy peer while another one is already running on the same port
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b5252ef91f
added new word recommendation library in DictionaryLoader_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1c007188ad
bugfixes in html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
231074bf0a
fixed a parsing bug by reverting SVN 7766
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
30a8a2f76b
*) replacing one ugly hack with an extended ugly hack ;-)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
95379ce0b1
*) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
24e76a7b69
*) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
...
*) Added description of where to place MediaWiki dump for import.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
d40a177c05
Generation Memory Strategy fine tuning
...
add some log-output in termlist_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
839f407fe4
Generation Memory Strategy fine tuning:
...
- some more optimism on requests of unknown values
- avoid a premature value of 0 byte available
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7903 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a5541751a8
- added memory computation to termlist_p.xml
...
- added option to delete terms in termlist_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
45e497a9bd
fix for term iteration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5dd2efc9a2
- bugfixes in html parser
...
- new fields in solr
- extended file viewer to debug parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c595a6a47
added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
75df87832c
refactoring/better naming of methods and classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5f8a5ca32d
- not doing merge-jobs while short on Memory
...
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
965fabfb87
enhanced sorting speed (affects all DB operations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7892 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
41a8ee4569
added iterable implementation in KeyList
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
22d69a6368
refactoring in cora: added sorting package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
51cf697acd
refactoring: moved all score-related classes to new ranking package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a0d5e7b6e6
added new score comparator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
4fec99115b
Implementation of strategies for controlling memory resources.
...
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
63a375b801
do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c58af6874
- added a short memory status simulation mode
...
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c64faf41e2
addon to svn 7880
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
7b7a196243
ignore cookies in httpclient per default
...
disable cookiestore,cause the default one caused segfaults on my peers
this does not harm use of cookies via YaCy as proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7881 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
411ed159f8
do some extra sleep while running low on memory
...
(1 sec. per outofmemoryCycle)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7879 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
9ab0ba41e2
using GzipDecompressingEntity from httpclient instead of our own
...
(was just fixed there in httpclient-4.1.2 and does a proper job)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7877 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
07f5954570
try better handling of corrupt blobs
...
@developer: please revert if I'm wrong
see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=3334
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7872 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago