orbiter
9416f5c26f
more speed test cases: kelondro provides map functions that are more than 20% faster than standard java classes and use less than halve of the memory of java classes:
...
just start IndexTest (here with 1000000 test objects)
Performance test: comparing HashMap, TreeMap and kelondroRow
generated 1000000 test data entries
STANDARD JAVA CLASS MAPS
sorted map
time for TreeMap<byte[]> generation: 2110
time for TreeMap<byte[]> test: 2516, 0 bugs
memory for TreeMap<byte[]>: 29 MB
unsorted map
time for HashMap<String> generation: 1157
time for HashMap<String> test: 1516, 0 bugs
memory for HashMap<String>: 61 MB
KELONDRO-ENHANCED MAPS
sorted map
time for kelondroMap<byte[]> generation: 1781
time for kelondroMap<byte[]> test: 2452, 0 bugs
memory for kelondroMap<byte[]>: 15 MB
unsorted map
time for HashMap<ByteArray> generation: 828
time for HashMap<ByteArray> test: 953, 0 bugs
memory for HashMap<ByteArray>: 9 MB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5847 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b53790abb1
more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5846 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8ffb9889e1
some fixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5845 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dfb96ecb72
more fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1b8d346b4c
fixes in connection with transiton to byte[] hashes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
0b0a46d35a
* fix transferRWI as suggested by celle (thanks!)
...
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2000#p14023
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5842 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
996572de95
quickfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
380ed2dac0
performance and debugging additions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5840 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
635b0a9da7
code-split
...
allow cgi indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5839 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fa3adbbfc6
added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5837 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
76af84d732
* add custom comparator to ScoreCluster for byte[]
...
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2010
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5836 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
ab0030d7a7
allow dht-out for remote-crawl processing peers on default settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5834 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
d1116c049f
*) added new method "contains()" to Blacklist interface
...
*) implemented contains() in class AbstractBlacklist
*) used new method in Blacklist_p to prevent double entries in blacklists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5832 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
08445e42f0
* don't throw exception, in case of bad charset in http-header
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5831 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
2f860a2564
* convert byte[] hashes to string for log output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5830 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
d93a2a6552
* ignore whitespaces so you can copy&paste signatures better
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fbcbcc5bdb
export of yacy document objects as dublin core record in xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d7cbf4cdd4
more performance hacks: less overhead in word hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5825 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
29e96c1a60
bugfixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5824 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4e97a31009
corrections in dublin core syntax
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5823 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
44daec7936
* introduce signatures to autoupdate
...
as long as there aren't publickeys for the updatelocations set,
no signatures are checked
* wiki-article follows...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
538e375901
replaced old caching method for computed word hashes with a better method. The word hash computation is a new performance bottleneck (after the IO bottleneck was removed with the IndexCell data structure) and a better caching for word hashes was necessary.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5821 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9e853e1977
partly reverting SVN 5818: identical comparator required for join operator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5820 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e16c25ddf7
(peak-) performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5819 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
63cd152969
fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5818 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7dfe7e7cc6
fixed some problems with surrogate reader. This is now ready for testing.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5817 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3a1364ed5c
removed example lines from SurrogateReader sources; added additional example file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5816 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9050a3c4c5
alpha version of surrogate reading and indexing.
...
see the example file for an explanation.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5815 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b15b059c0d
fix for latest commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5813 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c8624903c6
full redesign of index access data model:
...
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
dd6b5005ff
* fix missing charset handling in getpageinfo_p
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5811 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd5f4c78d8
- added default profile for surrogate indexing
...
- integrated surrogate indexing into indexing queue process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ad78e3a59f
- less lines in rssTerminal
...
- crawl more documents: if remote crawling is enabled, a remote crawl list is also loaded if a local crawl is running in case that the indexer is idle
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5809 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bc80dc913a
added new surrogate reader (surrogates are parsed documents on batches)
...
this will open a new way to insert indexes to YaCy (instead crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5808 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
12d81e98eb
- fixed bad search results when searching for empty string
...
- simplified result handling and page composition in case that nothing was searched
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5807 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8a24350036
- fix for join method with new generalized RWI data structure (caused by latest commit)
...
- added more functions to mediawiki parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e58320a507
added more info in log fore debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5805 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
89ec3acb3e
- full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
...
- during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
7a48090fcf
- fix for "uk" language
...
- svn attributes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5803 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dc2af61bc9
allow up to 50 results from remote peers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5802 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c0e8ed5461
fixed problem with not http client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5801 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8862a2fed0
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5799 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
de68948bc5
better handling of free memory computation and emrgency cache flush for index cell
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5798 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
fcb77c3140
* added .im (Isle of Man) to TLD-list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5794 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b81c7467d8
protection against too many files in RICELL in case of massive emergency dumps caused by low memory
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5791 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d4d87d90c4
- extended experimental wikipedia dump parser
...
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c3aff2521e
fix for NPE
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5789 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
57c00dd8c9
fix for bad filtering of common http error
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5788 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
14361f1ca4
added log message for index generation in HeapReader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5787 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c08f9b36a4
refactoring of wiki parser.
...
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago