orbiter
cf4fd525ee
added directDocByURL attribute in crawl profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
813f297a95
another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b250e6466d
implemented crawl restrictions for IP pattern and country lists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
57d5529a01
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7977 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
47a8c69745
added a new feature to MultiProtocolURIs to get the locale for each url:
...
This is done using a new library InetAddressLocator.jar which is NOT added by default to YaCy because it is very old and with that library we will never get a debian package. However, some people want that functionality and it can be made available if the library is taken from http://javainetlocator.sourceforge.net/ and placed into the /lib directory where it will be found using reflection.
The new feature will be used to extend the crawler steering.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7975 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c3161b4ac
refactoring:
...
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
277b454a62
*) added comments
...
*) minor refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7971 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6b22865dbc
- removed some warinings
...
- removed a dead update location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0c6d95e57b
- more tolerance against failure of table opening
...
- more connections for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6b02b696b0
- add number of search results to end of rss and json output to reflect latest status of retrieval
...
- distinguish search access with different verify state in access of search cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ce2a76d603
performance hack for search process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7961 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
aaf7a0feaa
yet another cache strategy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7959 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8a428d3e77
ensure termination of pdf parser to avoid deadlocking of other processes during search result preparation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7958 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c4a672fe2
bugfixes and performance hacks for tabe index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dad5b586a4
added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7956 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
734059d33e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
23e81b28b2
synchronization enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dd4635e323
patches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7953 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85a5487d6d
YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0819e1d397
protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2cba860693
- fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
...
- patch for better urls to solr admin interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2842ce30d6
added synchronization in ReferenceContainer and logging for shrinking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7937 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cec3836e73
added reference limitation to IndexControlRWIs_p.html servlet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7936 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
ecb4986b38
refactored stuff from last commit to ReferenceContainer
...
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
f7c4abfdd7
limit references per blob & term to the 100.000 youngest
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7934 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
28f5b79deb
added a fast mass-deletion method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7933 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a70dbce41c
added another file tool class to yacy-cora
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7932 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
49e5ca579f
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e02bfbde56
fix for solr url
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7930 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
580beb12a5
reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7928 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
44d6416e2d
ensure termination of shrink()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7927 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
52230a6864
replaced catching of Exception with Throwable, which catches also Errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7926 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
877eaf6bcb
switched off logging of org.apache.http which was suddenly switched on by default (??)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7925 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e1a3d609aa
moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
...
- some refactoring for mime type discovery
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3da21c4266
protection against starting of a (second) yacy peer while another one is already running on the same port
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b5252ef91f
added new word recommendation library in DictionaryLoader_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1c007188ad
bugfixes in html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
231074bf0a
fixed a parsing bug by reverting SVN 7766
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
30a8a2f76b
*) replacing one ugly hack with an extended ugly hack ;-)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
95379ce0b1
*) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
24e76a7b69
*) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
...
*) Added description of where to place MediaWiki dump for import.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
d40a177c05
Generation Memory Strategy fine tuning
...
add some log-output in termlist_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
839f407fe4
Generation Memory Strategy fine tuning:
...
- some more optimism on requests of unknown values
- avoid a premature value of 0 byte available
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7903 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a5541751a8
- added memory computation to termlist_p.xml
...
- added option to delete terms in termlist_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
45e497a9bd
fix for term iteration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5dd2efc9a2
- bugfixes in html parser
...
- new fields in solr
- extended file viewer to debug parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c595a6a47
added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
75df87832c
refactoring/better naming of methods and classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5f8a5ca32d
- not doing merge-jobs while short on Memory
...
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
965fabfb87
enhanced sorting speed (affects all DB operations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7892 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
41a8ee4569
added iterable implementation in KeyList
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
22d69a6368
refactoring in cora: added sorting package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
51cf697acd
refactoring: moved all score-related classes to new ranking package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a0d5e7b6e6
added new score comparator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
4fec99115b
Implementation of strategies for controlling memory resources.
...
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
63a375b801
do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c58af6874
- added a short memory status simulation mode
...
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c64faf41e2
addon to svn 7880
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
7b7a196243
ignore cookies in httpclient per default
...
disable cookiestore,cause the default one caused segfaults on my peers
this does not harm use of cookies via YaCy as proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7881 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
411ed159f8
do some extra sleep while running low on memory
...
(1 sec. per outofmemoryCycle)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7879 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
9ab0ba41e2
using GzipDecompressingEntity from httpclient instead of our own
...
(was just fixed there in httpclient-4.1.2 and does a proper job)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7877 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
07f5954570
try better handling of corrupt blobs
...
@developer: please revert if I'm wrong
see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=3334
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7872 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f970670a7c
- bugfix in ServerScannerList
...
- speed up of generation of scanner list avoiding forced dns lookup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7871 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8e03b8ee8b
better integration of server list in interactive search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0a3ab7da1b
do not sort concrrently the same array
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7868 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
eb14111200
encapsulate potential expensive objects in TextSnippet to allow GC them asap
...
this reduces chance of OOMs at massive search & snippet-fetching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0d33cf352b
removed synchronization in DNS resolve (solves a problem when loading snippets but in the past concurrent dns requests also caused deadlocks. but this is many years ago and we will give it another try)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7863 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
44d74f8f89
performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7861 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5cd07d7f84
early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
...
doing same thingy on other methods of touched files as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7860 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
a311596881
finishing up my commits (7855-7858) which could be helpful for
...
not declaring inside loops (helps GC of some VMs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
9170a434ed
throwing an exception again in FileUtils.copy(reader, writer)
...
OOMs could occour here and should not be ignored
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
ce248cc8dd
less byte-arrays of response-content, less byte-array <-> stream conversation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
59b767eebd
stop loading via http at defined maximum of bytes - even size is unknown before loading
...
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
916d79111e
Runtime.maxMemory() DOES change @ runtime:
...
I wondered getting Total-ram > Max-ram and MemoryControl.available() < 0
MemoryControl.available() < 0 causes some errors where its value is used for dimension of buffers for eg.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7852 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
299af4943c
added another memory protection hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7849 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1f300217f8
more protection for the cleanup thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7848 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d13103a0a7
changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7847 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b06faab9d3
do not allocate a StringBuilder object in case that there is not enough memory for that
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7846 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6a6f27eaf3
do not sort arrays again if arrays are already sorted
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7845 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3d043ce9d6
- refactoring
...
- do not start worker threads in Array class if concurrency is not used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7844 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
48b78e9ff4
disabling concurrency in new sort since that is not working yet correctly
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7843 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
62ac73a108
fixed bugs and deadlocks in core database indexing structures:
...
- added new Array class that contains an abstraction of the java Arrrays class which replaces the home-brew quicksort algorithm.
- the new class is about four times slower than the old one, but it works correct (the old one had errors)
- fixed a synchronization problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7842 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1912d0cccc
changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7840 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bb8e3f8523
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7839 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
11dc653de3
added a visualization of peer pings to the performance graphic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3a191cdf14
because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
52d799e7c8
fix for solr auth
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7833 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9eb8e9acd9
no error message about missing browser in headless environments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7832 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d3c89b90ce
temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7831 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bd99969758
fixed bad query
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7830 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
768c59740c
- replaced solrj 3.1 with solrj 3.3
...
- updated also slf4j
- added authentication for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
c7b95e8c81
*) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
...
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6d2e252bcf
fix for:
...
java.lang.NullPointerException
at net.yacy.kelondro.index.RowCollection.<init>(RowCollection.java:97)
at net.yacy.kelondro.index.RowSet.<init>(RowSet.java:48)
at net.yacy.kelondro.rwi.ReferenceContainer.<init>(ReferenceContainer.java:58)
at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:69)
at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:43)
at net.yacy.kelondro.blob.ArrayStack.merge(ArrayStack.java:1023)
at net.yacy.kelondro.blob.ArrayStack.mergeWorker(ArrayStack.java:922)
at net.yacy.kelondro.blob.ArrayStack.mergeMount(ArrayStack.java:869)
at net.yacy.kelondro.rwi.IODispatcher$MergeJob.merge(IODispatcher.java:267)
at net.yacy.kelondro.rwi.IODispatcher$MergeJob.access$300(IODispatcher.java:239)
at net.yacy.kelondro.rwi.IODispatcher.run(IODispatcher.java:180)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7822 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2d4bb139d3
- added counting of links with noindex tag for solr index
...
- bugfixes for solr index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
892caccdca
added default configuration in ConfigurationSet in case of new values
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bda3eec0ff
added parsing of canonical link element to html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b6f09a475d
- added an index profile editor in the /indexFederated_p.html servlet for solr indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago