orbiter
b53790abb1
more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5846 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8ffb9889e1
some fixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5845 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dfb96ecb72
more fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1b8d346b4c
fixes in connection with transiton to byte[] hashes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
0b0a46d35a
* fix transferRWI as suggested by celle (thanks!)
...
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2000#p14023
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5842 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
996572de95
quickfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
380ed2dac0
performance and debugging additions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5840 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
635b0a9da7
code-split
...
allow cgi indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5839 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fa3adbbfc6
added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5837 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
76af84d732
* add custom comparator to ScoreCluster for byte[]
...
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2010
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5836 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
ab0030d7a7
allow dht-out for remote-crawl processing peers on default settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5834 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
d1116c049f
*) added new method "contains()" to Blacklist interface
...
*) implemented contains() in class AbstractBlacklist
*) used new method in Blacklist_p to prevent double entries in blacklists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5832 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
08445e42f0
* don't throw exception, in case of bad charset in http-header
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5831 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
2f860a2564
* convert byte[] hashes to string for log output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5830 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
d93a2a6552
* ignore whitespaces so you can copy&paste signatures better
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fbcbcc5bdb
export of yacy document objects as dublin core record in xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d7cbf4cdd4
more performance hacks: less overhead in word hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5825 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
29e96c1a60
bugfixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5824 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4e97a31009
corrections in dublin core syntax
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5823 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
44daec7936
* introduce signatures to autoupdate
...
as long as there aren't publickeys for the updatelocations set,
no signatures are checked
* wiki-article follows...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
538e375901
replaced old caching method for computed word hashes with a better method. The word hash computation is a new performance bottleneck (after the IO bottleneck was removed with the IndexCell data structure) and a better caching for word hashes was necessary.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5821 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9e853e1977
partly reverting SVN 5818: identical comparator required for join operator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5820 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e16c25ddf7
(peak-) performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5819 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
63cd152969
fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5818 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7dfe7e7cc6
fixed some problems with surrogate reader. This is now ready for testing.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5817 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3a1364ed5c
removed example lines from SurrogateReader sources; added additional example file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5816 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9050a3c4c5
alpha version of surrogate reading and indexing.
...
see the example file for an explanation.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5815 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b15b059c0d
fix for latest commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5813 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c8624903c6
full redesign of index access data model:
...
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
dd6b5005ff
* fix missing charset handling in getpageinfo_p
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5811 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd5f4c78d8
- added default profile for surrogate indexing
...
- integrated surrogate indexing into indexing queue process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ad78e3a59f
- less lines in rssTerminal
...
- crawl more documents: if remote crawling is enabled, a remote crawl list is also loaded if a local crawl is running in case that the indexer is idle
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5809 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bc80dc913a
added new surrogate reader (surrogates are parsed documents on batches)
...
this will open a new way to insert indexes to YaCy (instead crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5808 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
12d81e98eb
- fixed bad search results when searching for empty string
...
- simplified result handling and page composition in case that nothing was searched
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5807 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8a24350036
- fix for join method with new generalized RWI data structure (caused by latest commit)
...
- added more functions to mediawiki parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e58320a507
added more info in log fore debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5805 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
89ec3acb3e
- full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
...
- during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
7a48090fcf
- fix for "uk" language
...
- svn attributes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5803 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dc2af61bc9
allow up to 50 results from remote peers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5802 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c0e8ed5461
fixed problem with not http client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5801 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8862a2fed0
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5799 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
de68948bc5
better handling of free memory computation and emrgency cache flush for index cell
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5798 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
fcb77c3140
* added .im (Isle of Man) to TLD-list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5794 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b81c7467d8
protection against too many files in RICELL in case of massive emergency dumps caused by low memory
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5791 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d4d87d90c4
- extended experimental wikipedia dump parser
...
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c3aff2521e
fix for NPE
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5789 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
57c00dd8c9
fix for bad filtering of common http error
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5788 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
14361f1ca4
added log message for index generation in HeapReader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5787 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c08f9b36a4
refactoring of wiki parser.
...
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
44e01afa5b
- refactoring
...
- a little bit more abstraction
- new interfaces for index abstraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5783 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
82fb60a720
increased memory limit for emergency cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5782 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
9180617dd9
*) Classes to handle import of lists (especially blacklists) from XML files, not used yet, but will be used soon.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5780 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
596e6215dc
fix in case of white space in path name
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5779 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b887f4a116
keep more free mem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5778 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c2359f20dd
refactoring: better abstraction of reference and metadata prototypes.
...
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ab656687d7
more strict BLOB initialization .. may also help to save some ram
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5776 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5b138ada16
fixes to web structure reference collection and url construction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5775 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a29a11e526
added evaluation of incoming links in webstructure api
...
the api hash changed, new XML schema.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5774 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f6691411b5
- migration of files from SplitTable (which are used for the URL-DB) to a different file name format.
...
- the file generation logic is slightly different: files may now have only a maximum size of one gigabyte and a maximum age of one month.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5773 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
shostakovich
1f37cc6107
Robots.txt is now reused after one day. See forum-topic:
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1669&p=13565#p13565
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5772 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f21a8c9e9c
a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5771 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7ba078daa1
- added fast site-operator
...
- refactoring merge into BLOBArray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b4126432bc
hardening of index dump write process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5769 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9bfb2641db
- removed deprecated threads
...
- added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
293290c317
fix for bad assert in last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5767 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd409fb7ba
added web structure analysis for a special domain that can be requested from the api.
...
Example:
http://localhost:8080/api/webstructure.xml?about=www.yacy.net
returns a xml with the following content:
<?xml version="1.0"?>
<webstructure>
<domains reference="reverse" count="1" maxref="300">
<domain host="www.yacy.net" id="FXg39Q" date="20090401">
<citation host="java.sun.com" id="o-R3yY" count="1" />
<citation host="yacy-suche.de" id="-KCLaB" count="1" />
<citation host="suma-ev.de" id="VRAHIA" count="1" />
<citation host="www.kit.edu" id="EMaLDQ" count="1" />
<citation host="yacy.net" id="Fh1hyQ" count="1" />
<citation host="www.fzk.de" id="V2Kl-A" count="1" />
<citation host="en.wikipedia.org" id="rwtdfR" count="3" />
<citation host="vimeo.com" id="MmdQDY" count="3" />
<citation host="liebel.fzk.de" id="sX4ozA" count="6" />
</domain>
</domains>
</webstructure>
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5766 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b6c2167143
- patch for bad web structure dumps
...
- added automatic slow down of accessed to specific domains when access to a web page fails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5765 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0139988c04
- added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up.
...
- added clean-up of unfinished merges and unused idx/gap files
- enhanced merge file selection method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5764 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3621aa96ab
- added a memory protection for the IndexCell migration
...
- fix for bad cell file selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5763 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
568e8f1741
fix in unmountBLOB
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5762 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9da69d6b68
- better selection of files to be merged
...
- fix for getChannel().close(), which works on windows but not on macs and linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d39a5b42ca
more care about open file handles. Now files also close on windows and can be deleted afterwards.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
029495e64d
fixed bug introduced in SVN 5756 in EcoTable.put()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5759 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
587838bd09
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d2e2420a68
- added another file selection method for index cell merge
...
- more hacks to check that files are closed propertly and filehandles do not exist after files are closed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5757 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
96eaecda3e
- added migration class to go from index collections to the index cell data structure.
...
- added better control over file deletion, because this sometimes fails, especially on windows
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0f0b4aec75
better index cell merge logic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5754 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
832fef670f
migration of urls-files into subdirectory METADATA
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5753 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fa07234d4e
fix for clear method: now deletes files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5752 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lulabad
df87e4dbf6
missing count of send Index and URLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5747 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
c450e3746b
svn attributes added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5736 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
37f892b988
added new concurrent merger class for IndexCell RWI data
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5735 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
8c494afcfe
svn attributes added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5734 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
67aaffc0a2
- added Latency control to the crawler:
...
because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases).
The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time.
- added API to monitor the latency times of the crawler:
a new api at /api/latency_p.xml returns the current response times of domains, the time when the domain was accessed by the crawler the last time and many more attributes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5733 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0926310461
another performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5731 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ebe5d69d14
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5730 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
61f9dbf0cc
- fixed a display problem in watch crawler
...
- another small enhancement in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5729 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b3f75e48fa
- enhanced balancer: auto-solving of waiting-deadlocks
...
- removed deprecated cache-init size value
- more debug lines for IndexCell cache dump merge
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9a90ea05e0
added a merge operation for IndexCell data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5727 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d99ff745aa
fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5726 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0c3ab291c4
fix for http://forum.yacy-websuche.de/viewtopic.php?p=13354#p13354
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5725 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a9cea419ef
Integration of the new index data structure IndexCell
...
This is the start of a testing phase for IndexCell data structure which will replace
the collections and caching strategy. IndexCall creation and maintenance is fast, has
no caching overhead, very low IO load and is the basis for the next data structure,
index segments.
IndexCell files are stored at DATA/<network>/TEXT/RICELL
With this commit still the old data structures are used, until a flag in yacy.conf is set.
To switch to the new data structure, set
useCell = true
in yacy.conf. Then you will have no access any more to TEXT/RICACHE and TEXT/RICOLLECTION
This code is still bleeding-edge development. Please do not use the new data structure for
production now. Future versions may have changed data types, or other storage locations.
The next main release will have a migration feature for old data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5724 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
fd0976c0a7
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5723 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
83792d9233
more refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
ce79239322
"typo"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5721 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
cdbdc731c5
small updates: unescape, isCGI
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5720 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
474aac65af
more refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5719 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
209f25f5f5
refactoring to integrate indexCell data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5718 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
359a238acf
faster isCGI()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5717 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
f75628e53b
some corrections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5716 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b7138e5fcb
even more efficient comparator calls (less System.arraycopy for primary keys)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5715 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
65784eb656
- more efficient comparator calls
...
- fix for http://forum.yacy-websuche.de/viewtopic.php?p=13331#p13331
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5714 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
44874cb550
added a deleteOnExit for blob file deletion in case that a deletion is not successful.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5713 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
66f78d67e0
bad idea. Concurrency in index management will be done differently
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5712 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7dff1cba62
removed option to use different primary keys in kelondro tables
...
this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5711 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7f67238f8b
refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
14a1c33823
refactoring of wordIndex class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d49238a637
more performance hacks: better default values for scaling, less memory usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5708 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
39644dc14e
performance hacks to compare methods in database core
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5707 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e2e7949feb
replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f6d989aa04
added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy.
...
The speed of the kelondro indexing class ObjectIndexCache can be compared with Javas standard TreeMap with the main method in IntegerHandleIndex. The result is, that the kelondro indexing needs only 1/5 of the memory that TreeMap uses! In exchange, the kelondro classes are slower than TreeMap, about four (!) times slower. However, this is not so bad because the better use of the memory is a strong advantage and makes it possible that YaCy can maintain such a large number of document (> 50 million) in one peer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5705 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
0a2fabeef3
static TMPDIR
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5704 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
9f7e62e900
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5703 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
f35dc11dc4
allow crawl start from pages with script tags
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1910
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5702 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6958eff196
removed unnecessary exceptions, extended testing in IntegerHandleIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5701 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
13c666adef
performance hack to ObjectIndex put() method:
...
Java standard classes provide a Map Interface, that has a put() method that returns the object that was replaced by the object that was the argument of the put call. The kelondro ObjectIndex defined a put method in the same way, that means it also returned the previous value of the Entry object before the put call. However, this value was not used by the calling code in the most cases. Omitting a return of the previous value would cause some performance benefit. This change implements a put method that does not return the previous value to reflect the common use. Omitting the return of previous values will cause some benefit in performance. The functionality to get the previous value is still maintained, and provided with a new 'replace' method.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5700 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1f1be1518c
added stub for another performance hack: concurrent indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5699 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3e4c28e188
enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5698 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
84e37387a2
fix for last commit and more testing stub
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5697 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ca006c506d
stub for performance enhancements for RowSet (no functional change yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5696 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d988204875
better shutdown of tools
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5695 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
100247bdda
added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following:
...
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -export DATA/INDEX/freeworld/TEXT xml urls.xml diffurlcol.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -delete DATA/INDEX/freeworld/TEXT diffurlcol.dump
The export-feature is optional, the purpose of that function is to provide a back-up function for URLs to be deleted. The export function can also be used to create html files with embedded links and simple text-files. Simply replace the 'xml' word with 'html' or 'text'. The last argument in the cann, the diffurlcol.dump value, can also be omitted. This will cause that the complete URL database is exported. This is an alternative to the Web-Interface based export function.
The delete-feature is the only destructive method of the four presented here. Please use it with care. It is better to make a back-up of the url database files before starting the deletion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5694 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
8c60d6d117
In DHT selection delete only those references that were actually selected
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5693 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
60078cf322
added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS
...
to use this, you must user the -incollection command before (see SVN 5687) and you need a
used.dump file that has been produced with that process.
Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump
or use different names for the dump files or more memory.
As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections.
The file has the format
{hash-12}*
that means: 12 byte long hashes are listed without any separation.
The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b1ddc4a83f
do not merge collections if ram == false
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5691 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dbdd10da84
better logging and startup behaviour for referenceHash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5690 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d64836c34f
added statistical analysis of URL reference
...
use that with the following command on a linux shell:
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump
for freeworld indexes.
For more details please see discussion below:
http://forum.yacy-websuche.de/viewtopic.php?p=13204#p13204
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5687 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3b28daab40
code-beautification (to be consistent with external documentation paper)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5686 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
485c9406e5
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1915&hilit=&p=13249#p13249
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5684 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
858f800a07
more logging in httpd to detect shutdown cause. See also:
...
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1914&hilit=&p=13246#p13246
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5683 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b80db04667
- refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
...
- fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled
- more memory for DHT selection required
- stub for URL reference hash statistics in index collections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
8ee946bf1d
show upnp status
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5679 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
16f5c6a85e
fixed merge method initialization in ReferenceContainer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5676 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d7a493b4f5
added experimental timeline api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5672 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
efcd95dc37
simplification of (internal) query process / refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f1b712c29a
small corrections to image loading methods in result presentation
...
especially loading of favicons in search results. This is a fix that
affects only searches in intranet/repository configurations.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5670 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d4b56d5819
added more asserts to BLOBHeap.flushBuffer() to fix the problem described in
...
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1679&hilit=&p=13109#p13109
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5666 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
c545fcb9fa
* add class to handle keys and signatures
...
* fix bug in serverCharBuffer
* add build-target to sign tar.gz (run ant dist sign)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
aa44d9bad9
more refactoring of kelondro.text / deleted de.anomic.index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6ffc6e3389
more refactoring of indexer and kelondro classes;
...
- integrating the indexer into kelondro as package 'text'
- renaming of classes in kelondro.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
404bc21da9
simplification of (internal) query process / refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5662 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
76ef5f0f14
refactoring of index package: better names for the classes (to be continued)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2df57b1fd1
refactoring of index collection class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
39a177649b
* added upnp listener for devices that do not respond to discovery but advertise themselves
...
* moved package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d1d9fbae5c
enabling the URLAnalysis to operate on multime input files, just use a wild card when calling the class from the command line
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5658 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c728879ab8
fixes to yacyURL - more exceptions in case that urls are strange
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5657 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7542336ae5
performance enhancement to yacyURL: omit second processing of resolveBackpath. This method is already applied during initialization of the object and was called a second time when the url was exportet.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5656 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7ea53fe47b
added another url list transformation option:
...
- check the list and kick out entries with lines that contain not valid urls
- normalize the urls
- remove doubles
- sort the list
- split the list in smaller chunks
This is all done in one process which can be called with a new -sort option
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5655 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e521e81148
bugfix in yacyURL (for latest performance hack)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5654 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
54625360f7
performance update
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5653 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago