orbiter
89ec3acb3e
- full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
...
- during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
7a48090fcf
- fix for "uk" language
...
- svn attributes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5803 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dc2af61bc9
allow up to 50 results from remote peers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5802 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c0e8ed5461
fixed problem with not http client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5801 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
6504b21cea
*) fix for http://forum.yacy-websuche.de/viewtopic.php?t=1976
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5800 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8862a2fed0
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5799 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
de68948bc5
better handling of free memory computation and emrgency cache flush for index cell
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5798 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
601d63ef48
removed comment tag (no use at this point)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5797 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
c2d85b039e
*) added language statistics files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5796 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
0c8fd811dc
*) first and very limited version of XML import, does not use benefits provided by XML yet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5795 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
fcb77c3140
* added .im (Isle of Man) to TLD-list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5794 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8ce5bb4f31
added shell scripts that list host addresses
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5793 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
51ea865569
small fix for localsearch shell script
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5792 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b81c7467d8
protection against too many files in RICELL in case of massive emergency dumps caused by low memory
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5791 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d4d87d90c4
- extended experimental wikipedia dump parser
...
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c3aff2521e
fix for NPE
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5789 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
57c00dd8c9
fix for bad filtering of common http error
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5788 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
14361f1ca4
added log message for index generation in HeapReader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5787 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
43bcd192cd
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5786 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c08f9b36a4
refactoring of wiki parser.
...
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
faeff21012
- fix for display of automatic ReCrawls in surftips
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5784 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
44e01afa5b
- refactoring
...
- a little bit more abstraction
- new interfaces for index abstraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5783 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
82fb60a720
increased memory limit for emergency cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5782 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4905a17f6a
moved xerces.jar from libx to lib
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5781 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
9180617dd9
*) Classes to handle import of lists (especially blacklists) from XML files, not used yet, but will be used soon.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5780 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
596e6215dc
fix in case of white space in path name
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5779 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b887f4a116
keep more free mem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5778 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c2359f20dd
refactoring: better abstraction of reference and metadata prototypes.
...
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ab656687d7
more strict BLOB initialization .. may also help to save some ram
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5776 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5b138ada16
fixes to web structure reference collection and url construction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5775 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a29a11e526
added evaluation of incoming links in webstructure api
...
the api hash changed, new XML schema.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5774 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f6691411b5
- migration of files from SplitTable (which are used for the URL-DB) to a different file name format.
...
- the file generation logic is slightly different: files may now have only a maximum size of one gigabyte and a maximum age of one month.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5773 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
shostakovich
1f37cc6107
Robots.txt is now reused after one day. See forum-topic:
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1669&p=13565#p13565
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5772 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f21a8c9e9c
a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5771 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7ba078daa1
- added fast site-operator
...
- refactoring merge into BLOBArray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b4126432bc
hardening of index dump write process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5769 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9bfb2641db
- removed deprecated threads
...
- added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
293290c317
fix for bad assert in last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5767 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd409fb7ba
added web structure analysis for a special domain that can be requested from the api.
...
Example:
http://localhost:8080/api/webstructure.xml?about=www.yacy.net
returns a xml with the following content:
<?xml version="1.0"?>
<webstructure>
<domains reference="reverse" count="1" maxref="300">
<domain host="www.yacy.net" id="FXg39Q" date="20090401">
<citation host="java.sun.com" id="o-R3yY" count="1" />
<citation host="yacy-suche.de" id="-KCLaB" count="1" />
<citation host="suma-ev.de" id="VRAHIA" count="1" />
<citation host="www.kit.edu" id="EMaLDQ" count="1" />
<citation host="yacy.net" id="Fh1hyQ" count="1" />
<citation host="www.fzk.de" id="V2Kl-A" count="1" />
<citation host="en.wikipedia.org" id="rwtdfR" count="3" />
<citation host="vimeo.com" id="MmdQDY" count="3" />
<citation host="liebel.fzk.de" id="sX4ozA" count="6" />
</domain>
</domains>
</webstructure>
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5766 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b6c2167143
- patch for bad web structure dumps
...
- added automatic slow down of accessed to specific domains when access to a web page fails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5765 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0139988c04
- added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up.
...
- added clean-up of unfinished merges and unused idx/gap files
- enhanced merge file selection method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5764 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3621aa96ab
- added a memory protection for the IndexCell migration
...
- fix for bad cell file selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5763 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
568e8f1741
fix in unmountBLOB
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5762 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9da69d6b68
- better selection of files to be merged
...
- fix for getChannel().close(), which works on windows but not on macs and linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d39a5b42ca
more care about open file handles. Now files also close on windows and can be deleted afterwards.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
029495e64d
fixed bug introduced in SVN 5756 in EcoTable.put()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5759 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
587838bd09
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d2e2420a68
- added another file selection method for index cell merge
...
- more hacks to check that files are closed propertly and filehandles do not exist after files are closed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5757 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
96eaecda3e
- added migration class to go from index collections to the index cell data structure.
...
- added better control over file deletion, because this sometimes fails, especially on windows
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
9ab009b16b
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1890#p13476
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5755 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago