theli
f37e2041e8
*) adding soap function to import yacy bookmarks from xml or html (transfered via soap attachments)
...
*) soapHandler: code cleanup for service deployment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2915 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f1ed55a5fc
bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2913 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8fdefd5c68
generalization of payload definition of index storage
...
this is one step forward to the migration to a new collection data format
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
29a1f132ec
*) some strings replaced by constants
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2910 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4a3ec63e34
*) new soap service to manage yacy bookmarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2906 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
(no author)
9b3fd2b9e5
*) removing doctype definition to avoid problems with xml parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2905 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
(no author)
c64d5018b4
*) Bugfix. Problem in XML Parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2903 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5e57e0814d
*) new soap function to display log
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2902 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ad248d61ca
*) more verbose exception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2901 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hydrox
7e8669b15c
*) added possibility to "recycle" a DHTChunk that failed to transfer.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2898 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
4feaa91890
*) Added additional MIME-Type.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2895 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
89af433879
*) Deleted parts of WebCat that were not needed for parsing SWFs.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2893 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
46a712e195
- more asserts
...
- simplified indexURLEntry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2891 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
8c9bc7e341
*) extracting urls works now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2890 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
fc2936d500
bugfix for internal index entry generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2889 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
493391e42d
*) new flash parser, still experimental
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2888 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
215c4e65f1
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bd4f43cd66
- fixed a null pointer exception bug
...
- switched off more write caches
- re-enabled index-abstracts search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
194d42b6a7
*) changed PPM-calculation to be more accurate
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2884 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
fe8afaf426
switched off usage of write cache for imprortant databases
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2883 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
985fd807cc
bugfixing in collection methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2882 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c7bea4addb
*) soap api
...
- adding function to get and set message forwarding
- adding new testclass
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2878 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ee4d4e8567
*) Soap-handler: bugfix. wrong content-length was send when using content-encoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2877 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d3431433b0
more anonymization in logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2876 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
e6044e5198
bugfix for
...
http://www.yacy-forum.de/viewtopic.php?p=27207#27207
and
http://www.yacy-forum.de/viewtopic.php?p=27219#27219
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2875 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4d19d94348
*) bugfix for nullpointerexception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2874 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
532c23b5c7
*) soap handler
...
- better errorhandling
- adding support for outgoing transfer- and content-encoding
- avoid holding outgoing messages into memory before sending them
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2872 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
78b7f6f7fd
bugfix for index remove bug,
...
appeared after search where snippet-loading triggered word removal
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
(no author)
0e79f2fd7e
name of the file to tranlate apears ahead its translation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2868 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ebd2d629d8
added missing file for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2866 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
147d88cf23
re-design of database caching
...
this should reduce IO a lot, because write caches are now actived for all databases
- added new caching class that combines a read- and write-cache.
- removed old read and write cache classes
- removed superfluous RAM index (can be replaced by kelonodroRowSet)
- addoped all current classes that used the old caching methods
- more asserts, more bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4e363108e1
- removed bad debug code that caused a large and unnecessary delay during global search
...
- fixed problem that global search results disappear after a search
- removed some stopwords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2861 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f21ede312e
bugfixes for internals of database organization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2860 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
eb4bfb0e9d
fixed problem with cache.profile()
...
see also: http://www.yacy-forum.de/viewtopic.php?p=27109#27109
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2859 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2a9d868f6d
- removed object cache from kelondroTree
...
- generalized object caching and added new object caching class
- added object caching wherever kelondroTree was used
- added object caching also to usage of kelondroFlex
- added object buffering (a write cache) to NURLs
- added many assert statements; fixed bugs here and there
- added missing close methods to latest added classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7299dc30e3
*) new soap service to manage the yacy file-share
...
- upload / download files (as soap attachment)
- create directory
- receive directory listing
- delete files / directories
- change file comment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2857 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
777e39cea0
*) new template to display the dir-listing in xml format.
...
This can e.g. be done by using the url http://localhost:8080/share/?format=xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2856 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
9e8942a064
*) adding method to implement blacklist from file
...
- file transfer is done via soap attachments (see BlaclistSerivceTest for details)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2855 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4d1f933ea1
*) avoid reading of content body into memory
...
*) Bugfix for soap attachment support
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2854 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
88cfdecd38
*) Bugfix: calling close must not close the wrapped input stream, otherwise
...
keep-alive connections would terminate
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2853 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
d38ef0493d
*) be more tolerant against missing ports in url
...
"http://yacy.net:/ " is now interpreted as "http://yacy.net/ "
See: http://www.yacy-forum.de/viewtopic.php?p=27102
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2852 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cfe54fedc7
*) Bugfix for resolveBackpath problem with tailing /..
...
*) Junit testclass for resolveBackpath testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2850 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
dc056fabf3
small bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2847 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
278d8c3c7e
- more asserts
...
- bugfix for reading of previously deleted nodex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2845 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
5a6488256d
catch the "username too short" exception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2844 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2d3f1a53fd
handling of Missing byte-order mark exception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2842 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ac13fa763a
*) bugfix for blacklist remove (blacklist was not informed about remove)
...
*) adding new soap service class for blacklist management
*) new junit class to test soap blacklist service
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2841 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
8a5c2d0a19
fix for supertemplates, too.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2839 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
c35793fb46
fix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2838 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3e0516446b
*) new soap function to get the current queue status
...
*) new junit testclass to test soap statusService
*) refactoring of admin service (usage of constants instead of strings)
*) libraries upgraded to newer version + adding missing dependency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2836 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
a831c83025
create servletProperties, with the servlet specific funktions from serverObjects
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2835 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
83a0efc65a
better assert statements and fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2833 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
karlchenofhell
d13b381f83
- added mint-green skin
...
- removed test-urls because of problems with text-encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2832 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2025e885d6
a fix for problems with remove situations in kelondroFlexSplitTable
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2831 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b12da510f3
*) adding optional libraries for needed for soap attachments
...
(jikes won't compile without them)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2827 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
9eecc9a888
*) libs added to classpath
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2824 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a1acc9c389
*) new function to configure distributed crawling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2823 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
0996e550e7
*) deploy soap peer admin service
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2822 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
3ffc5b8793
fixed problem with serverCharBuffer.append(char)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2821 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8b56887676
removed unused code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2820 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
06854988da
- full integration of new LURL database in INDEX
...
- added migration method for urlHash.db into INDEX
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
(no author)
02c66c04f2
*) Missing file from last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2818 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
octoate
e4a3574b77
StringBuffer now resets every time the parser is called
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2817 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ef912811f1
*) adding new soap service for peer administration
...
- configure dht transfer properties
- configure remote proxy
- configure peer name / peer port
- configure admin username + pwd
- get peer version information
- set/get peer configuration settings
- shutdown peer
*) new function to get the opensearch description via soap call
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2816 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
karlchenofhell
ce237aefad
- assortment-sizes table from PerformanceQueues_p.html is not shown if not used
...
- escape query- and fragment-part of an url as well
- new resolveBackpath for urls: http://www.yacy-forum.de/viewtopic.php?t=2679#24867
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2815 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
68204ff729
*) Suppressing for bad client requests.
...
See: http://www.yacy-forum.de/viewtopic.php?p=26918
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2814 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c1dff41f99
*) adding possibility to deploy custom SOAP services
...
See: http://www.yacy-forum.de/viewtopic.php?p=26748#26748
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2813 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
df49724f28
*) better error handling for seed upload - test download - problems
...
See: http://www.yacy-forum.de/viewtopic.php?p=26814#26814
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2812 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a5b9b514c1
*) retry crawling without content-encoding if the content-encoding header was not correct
...
See: http://www.yacy-forum.de/viewtopic.php?p=26917#26917
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2811 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
52466067d8
*) Bugfix for ArrayIndexOutOfBoundsExceptions which occure because SimpleDateFormat is not thread-safe
...
See: http://www.yacy-forum.de/viewtopic.php?t=2995
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2810 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b357a13e9a
*) adding synchronization block because SimpleDateFormat is not thread-safe
...
See: http://www.yacy-forum.de/viewtopic.php?p=26906#26906
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2809 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
92f774edd1
*) Better charset encoding detection
...
*) New testclass for charset encoding detection tests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b79e06615d
- added new LURL.Entry class for next database migration
...
- refactoring of affected classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
octoate
cc24dde5e0
First version of a MS Excel parser based on Apache POI
...
(event based parsing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2801 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
karlchenofhell
4c63129136
- stupid mistake...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2798 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
karlchenofhell
b14a500b88
- removed debug output from PerformanceMemory_p
...
- added URL escaping (tested, nevertheless watch out for possibly broken URLs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2797 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
karlchenofhell
ebf0da2a45
- now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
09337c9751
*) Bugfix wrong chars in soap search result document
...
See: http://www.yacy-forum.de/viewtopic.php?t=2906
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2795 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3d152bfe43
*) Logging message added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2794 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
karlchenofhell
b5e40e2fa2
- fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
96f45e9b15
*) Bugfix wrong chars in soap search result document
...
See: http://www.yacy-forum.de/viewtopic.php?t=2906
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2791 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
da2ac6fa23
*) adding new ant target to allow generation of client stub classes for yacy soap api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2789 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9cc6df21b
*) adding wsdl files to generate client stub classes with ant
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2788 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
77a59a115d
refactoring of indexing methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
14490f0a83
added missing flush statement
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2786 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
688cbfb776
- bugfixing for flextable bug
...
- bugfixing for collection index bug
- several other bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2785 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
a29b4d4fb5
extended Supertemplates for Headerincludes.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2780 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a7e11ada50
*) suppressing stacktrace for "server has closed connection"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2779 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5b114249ce
*) Bugfix for ViewLog problem with multiline logging messages
...
See: http://www.yacy-forum.de/viewtopic.php?t=2972
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2774 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
de5e233766
*) Bugfix for GuiHandler sorting problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2773 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fd94aa4bef
*) Bugfix for IndexOutOfBound in GuiHandler
...
*) Bugfix for reversed order displaying of messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2772 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
29a1318ef9
bugfixes for wrong database access that do not consider deleted entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2767 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cbb1e710b9
*) removing old class
...
- was replaced by plasma/urlPattern/defaultURLPattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2765 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c6d46f7ebd
null pointer bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2761 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
decb09df6d
*) Trying to be more tolerant against wrong charset names
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2760 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e9afe39cbb
*) Trying to be more tolerant against wrong charset names
...
See: http://www.yacy-forum.de/viewtopic.php?p=26662
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2759 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7526c831a8
*) Suppressing stracktrace
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2758 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
50f2578c55
- some bugfixing and code cleanup
...
- now assortments can completely left out if they do not exist
before startup and collection index is selected.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bdf4c7c51e
added missing files for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a5dd0d41af
- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
...
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
deleted
- The complete database can be limited to a specific size (as wanted many times)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
130cc76927
loop detection and termination in deletedHandles method
...
see also: http://www.yacy-forum.de/viewtopic.php?p=26655#26655
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2754 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
octoate
1c4076da8a
First version of the MS Powerpoint parser based on Apache POI
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2753 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5b75d64d7d
*) bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2750 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
71ed104bc7
*) adding additional rpm mimetype (used by packman)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2749 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
76d959122b
new constants, finals, Stringbuffer, cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2748 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6396f5971e
bugfixes and migration attempt toward new kelondroFlex db
...
- more synchronization
- bugfix for remove in collections
- bugfix in kelondroFlex (wrong exception condition!)
- options to use RAM, FLEX and TREE tables for Crawl URL stacker
- default for Crawl URL stacker is now FLEX (!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
48f81acc0e
reverse SVN 2744, it is not needed
...
(this resulted from a small misunderstanding of the newest cache layout)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
1da9aece12
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
918b59dc5e
- bugfix for snippet profile (no delete button)
...
- bugfix for search process (avoided null pointer exception in case other peer does not respond)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2742 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2bb529cedb
added peer tags for peers in robinson mode
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2741 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
afbb547f3d
extended options for abstracts generation in remote search interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2739 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
22649408ad
*) Better errorhandling for charset encoding problem during content parsing
...
See: http://www.yacy-forum.de/viewtopic.php?t=2952
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9c7e3f061
*) Bugfix for NoSuchElementException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f25f61d9d3
documentation of compile problem. See
...
http://www.yacy-forum.de/viewtopic.php?p=26407#26407
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2734 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c8f3a7d363
added snippet-url re-indexing
...
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
2cfd4633ac
*) even better handling of searchwords in snippets, words can consist of letters and numbers now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b062847797
fix for
...
http://www.yacy-forum.de/viewtopic.php?p=26439#26439
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2731 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
e17fea7015
files in htcache are now stored in different hash/tree subdirectories
...
according to storage method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
661f005214
fix for seed upload build script
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2729 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
2d3b7251a4
*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ddf8f220f6
fix for build fail
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2727 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
25ae3d3161
generalized definition of hexhash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
86047f439d
removed very bad bug that prevented production of any remote search result
...
:-(((
Please update!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2724 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f0d747c723
removed deprecated method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5ff77612ac
bugfix for old WORDS storage method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0f10bdde22
more generic cache methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
72482b1426
fixed scraper
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2720 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
6557112d8f
small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
440c6ee657
Implement alternative htcache layout
...
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
226f2c5b2c
first version, of the Serverlet Debugger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
adf1f74ab2
bugfix for java 1.5 compile problem with serverCharBuffer.append(char)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2716 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
fd61209797
lines inside tags without punctuation are extended by a single dot.
...
This enables the condenser to distinguish the lines in a better way.
The result is a better preparation of snippets.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
1d0c0edda3
first version of posts/get from the del.icio.us api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1969522dc1
removed lowercase of snippets (and other things):
...
- added new sentence parser to condenser
- sentence parsing can now handle charsets
to do: charsets must be handed over to new sentence parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
43614f1b36
bugfix in collection index. the index for collections was not created correctly
...
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1dfab1abe3
more control for seed receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2709 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1c0e65f55f
*) Bugfix for problems with charset detection
...
See: http://www.yacy-forum.de/viewtopic.php?p=26196
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2708 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
db294687ea
enhanced logging
...
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9a0f51303
*) suppressing InterruptedException errormessage
...
See: http://www.yacy-forum.de/viewtopic.php?t=2915
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ce7ee74316
*) better errorhandling in filehandler (try catch block now starts before argument parsing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1d4fb680ce
*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
...
TODO: make this limit configurable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1586d57187
*) odtParser: better handling of large files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
f17ce28b6d
*) plasmaHTCache:
...
- method loadResourceContent defined as deprecated.
Please do not use this function to avoid OutOfMemory Exceptions
when loading large files
- new function getResourceContentStream to get an inputstream of a cache file
- new function getResourceContentLength to get the size of a cached file
*) httpc.java:
- Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
- new option to hold loaded resource content in memory
- adding option to use the worker class without the worker pool
(needed by the snippet fetcher)
*) plasmaSnippetCache
- snippet loader does not use a crawl-worker from pool but uses
a newly created instance to avoid blocking by normal crawling
activity.
- now operates on streams instead of byte arrays to avoid OutOfMemory
Exceptions when operating on large files
- snippet loader now forces the crawl-worker to keep the loaded
resource in memory to avoid IO
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
- keep resource in memory whenever possible (to avoid IO)
- when parsing from stream the content length must be passed to the parser function now.
this length value is needed by the parsers to decide if the parsed resource content is to large
to hold it in memory and must be stored to file
- AbstractParser.java: new function to pass the contentLength of a resource to the parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
630a955674
read snippets from cache in case they are not provided in RAM
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bcf2b800b4
applied UTF-8 encoding parameter to yacy-internal protocol communication
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c40fca08a2
fixed bad handling of string separation
...
you can now use a new encoding attribute to create strings from byte arrays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5a40ea7866
refactoring of wget string list generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
dbc2e039bb
added time-out option parameter to call hierarchy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d4c239e4be
- fixed problem in collection index with deletion of single url references
...
- added automatic deletion of not-found snippets after search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
00746ca232
identified and fixed search performance problem caused by
...
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b033a80750
better control of failure in node seek of kelondroTree
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2686 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
310f1c41cd
added option to see ranking scores in surftipps
...
and some cleanups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a2e3095044
*) Bugfix. Add missing plasmaParserDocument.close() calls
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cd5f349666
*) Better handling of large files during parsing
...
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
Attention: the caller of this function has to ensure that enough memory is available to do this
to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java:
- better handling of documents with exotic charsets
- better handling of large documents
- better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
to this object as byte array or temp file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8b2ceddb91
*) Displaying servere and warning logging messages in different colors on ViewLog_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2678 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
f8ac694e51
*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
df1629b05a
- code cleanup
...
- version 0.471
- moved surftipps to own web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c665f6cddb
*) handling of quotes in charset string
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2674 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b73efd5565
*) missing changes needed because of last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
140ddba93f
*) adding soap functions to pause and resume the crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2668 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2463e5624a
'quick' release 0.47
...
- documentation update
- necessary bugfixes (missing css for new peers)
- reduced effect of search result redundancy filter
- removed some debug output, but not all
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
49fbb688df
*) SOAP: old urlInfo renamed to urlInfoByHash, new urlInfo Function added.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2662 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8f143d516b
*) make snippet fetcher accessible via soap api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2661 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
97615af406
*) Restructuring of YaCy SOAP services
...
- general functions moved to abstract service class
- service class splitted into SearchService, CrawlService, StatusService
*) Bugfix for SOAP search services
- Attention: some xml tages where renamed
See: http://www.yacy-forum.de/viewtopic.php?p=25877
*) New SOAP service function urlInfo to view the parsed content of an URL
See: http://www.yacy-forum.de/viewtopic.php?p=25869
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2660 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
241b881560
*) Redesign of YaCy SOAP handler
...
- should be more fail-safe now
- better handling of compressed request bodies
- better handling of persistent connections
- better handling of AxisFaults
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2659 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
009a33170b
*) Content-Location header added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2658 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1aa07a52cd
*) Bugfix for UnsupportedEncodingException if the media type contains multiple parameters
...
See: http://www.yacy-forum.de/viewtopic.php?p=25832#25826
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2654 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
625c2ce6b1
*) bugfix for snippet fetching problem if content but not http header is available in cache
...
See: http://www.yacy-forum.de/viewtopic.php?p=25748
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
813a8a8179
*) migration of mimeTypeParser to jmimemagic 0.1
...
- better mimetype detection for rss feeds
- better mimetype detection for odt documents (less memory consuming)
- two new detector classes implementing MagicDetector interface of jmimemagic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
3f5a4153a0
Make Peers more receptible to transferred indexes
...
- Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit
so that the inCache gets flushed when the limit is passed
- Modify flushCacheSome to flush enough words to get below MaxWordCount immediately
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
57415b6889
*) Bugfix for surftipps UTF-8 problem
...
See: http://www.yacy-forum.de/viewtopic.php?t=2864
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2647 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
b0a4fcce8c
fix from theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2642 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b6c7b91582
*) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
...
*) better logging of parser failures
*) simplified usage of plasmaparser through switchboard
*) restructuring of crawler
- crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher)
*) snippet-fetcher: more verbose error messages
*) serverByteBuffer.java: adding new function append(String,encoding)
*) serverFileUtils.java: adding functions to copy only a given number of bytes between streams
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
e03427871e
enhanced surftipps:
...
- added switchh to show or hide surftipps
- more news contribute to surftipps
- added voting system for surftipps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2638 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1dc12d6659
*) Bugfix for shutdown problem caused by cacheScan thread
...
See: http://www.yacy-forum.de/viewtopic.php?p=25729
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
42173462f5
rename cutUrlText to shortenURLString;
...
other little things;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
af1d89e381
check url == null added;
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2634 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cc667b0aa5
*) htmlFilterContentScraper.java: adding support for link tag
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2633 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
26dfbb7499
*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cf6acff2c2
*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
...
default InputStream Buffer size.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
f18304ddd3
unused/not needed imports removes;
...
properties added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2628 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ec031eb993
first version of surftipps
...
see http://localhost:8080/index.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2627 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
b174fbd0ca
"import ...*" removed;
...
properties added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2626 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
807756150e
patch for strange bug reported by email
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2625 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5c6251bced
*) some improvements for extended html document charset support
...
- new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract
the charset meta data. This is only enabled for the crawler at the moment. Integration into
proxy needs more testing.
- adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed
about detected tags (used by the htmlFilterInputStream.java)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
33f0f703c0
*) reinserting type cast again
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2623 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8c11a543dc
fixed line ending coding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2622 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b690597275
*) adding casts to avoid compatibility problems between java 1.4 and java 1.5 writer class usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2621 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5afb0cbce8
*) setting default charset (for unkown documents) to iso-8859-1
...
*)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f453c14b5d
removed unreacheable catch blocks and unused imports
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ad7f600f25
*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
97d2a08ef1
*) restructuring needed to support parsing of documents using various charsets
...
- serverFileUtils.java:
-- adding methods to copy from stream to writer and readers to writers
-- moving httpc writeX methods into serverFileUtils class
- serverCharBuffer.java: removing inheritance from Writer class
- replacing htmlFilterOutputStream by htmlFilterWriter class which handles
content as char stream
- htmlFilterContentTransformer.java: deactivating getText mode
(still needs to be migrated to use char streams instead of byte streams)
- changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
- changes in Scraper and Transformer classes to operate on chars instead of bytes
- httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fc594e8eda
*) adding httpContentLengthInputStream.java class to allow reading of http response bodies
...
until EOF even if a persistent connection is used
*) httpdByteCountInputStream.java: adding skip method
*) httpHeader.java: adding getCharacterEncoding function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2616 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
cd636eb00e
*) Fix for the fix...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2615 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
f9a5b55a9e
*) Fixed bug described in http://www.yacy-forum.de/viewtopic.php?p=25448#25448
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2614 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
3aac5b26da
- added automatic tag generation when a web page from the search results is added
...
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
8a30c5343d
*) Fixed bug where exclamation marks could get lost between [=...=] and <pre>...</pre>
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2612 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
d8f4b17e31
*) Hopefully fixed bug described in http://www.yacy-forum.de/viewtopic.php?t=2825 .
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2611 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
0e84a969d6
*) Bugfix for serverCharBuffer read from file operation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2607 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
90ef19d778
*) first version of a serverCharBuffer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2606 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d374ef2bbe
bugfix for tryRemoveURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2605 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f644a1c3a7
better evaluation of index abstracts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1b48473bc5
bugfix to utf8 recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2603 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
90f7241b59
serverByteBuffer.trim() can now recognize utf-8 characters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2602 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
2fd610b556
http://www.yacy-forum.de/viewtopic.php?p=25611#25611
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e34d9b3fec
*) charset aware headlines (after the serverByteBuffer.trim problem is solved)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2599 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8115ac47b5
*) charset aware metadata parsing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2598 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3ac30bdf22
*) some todo markers added for additional charset support
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2597 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
06fa891152
*) htmlFilterContentScraper.java: using proper charset for document title
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
74c3e7cf29
*) storing document charset into plasmaParserDocument object (is needed later by the condenser)
...
*) htmlFilterContentScraper.java: using proper charset for document title
*) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c5d3020941
*) better errorhandling for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
d0a5a53789
*) changes needed for multi-language support
...
- parsers may need to know the charset of the byte stream
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d82875c72b
removed removal of 'funny symbols' that may have caused utf-8 problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2589 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
26ab1fa885
fixed null pointer exception
...
See http://www.yacy-forum.de/viewtopic.php?p=25598#25598
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b0e8ff6eda
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
41e27b85b7
fix for crawler condition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0ee7e45413
bugfix for merge method (caused by bad refactoring)
...
see http://www.yacy-forum.de/viewtopic.php?p=25529#25529
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2581 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5c2f30eaca
adjustments to dhtInCache write
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2579 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
9ecf7f0da2
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2578 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e2f8339827
*) some bugfixes for UTF-8 related problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2577 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c89d8142bb
replaced old 'kCache' by a full-controlled cache
...
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6e2907135a
bugfixes for remote search server part
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
cf9884e22b
first attempt to implement a secondary search
...
this is a set of search processes that shall enrich search results
with specialized requests to realize a combination of search results
from different peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
2a06ce5538
*) next bugfix for UTF-8
...
- Sending UFT-8 messages to other peers did not work
- httpd.java: minor corrections for UTF-8
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2570 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
bdc51591ae
*) UTF-8 Bug solved (hopefully)
...
See: http://www.yacy-forum.de/viewtopic.php?p=25522
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2569 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ef751b9d33
*) removing all string operations from the template engine
...
- engine should fully operate on bytes now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2567 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
7ef80c1026
more debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2566 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b251076e64
avoid ConcurrentModificationException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
75b198bc02
- updated references to indexContainer
...
- more bugfixes and debugging for indexAbstract processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0bed3b9ac3
removed superfluous interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2554 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b7e7808ea6
wordmigration now works also for new index database
...
if the new database is switched on, no 'too big' messages appear,
all the WORDS files can be completely migrated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2553 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a0ddf2ec11
*) AbstractCrawlWorker.java: delete already downloaded data on crawling error
...
*) plasmaSwitchboard.java: log unexpected errors while parsing/indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4f9e42d5ed
more changes towards better join-search
...
- fixed problems with index-abstract generation
- added analysis output for index abstract receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a7281a9b4d
fix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
82a6054275
- fixed bug with new indexAbstract generation
...
- added partly evaluation of indexAbstracts during remote searches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fded1f4a5d
*) better handling of maximum file size limit in crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
416b4e5c6b
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2542 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
309accb983
memory control for ymage generation:
...
the ymageMatrix initializer throws an RuntimeException if there is not
enough memory available to generate a new ymage of wanted size
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2541 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
74d1dea30b
changes towards better join-search
...
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ae4e8ce03e
- cut for 'probably last html-interface version': version number update
...
- small enhancement to ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2536 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
64bed59ee8
enhancements to ranking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
63893003be
*) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
...
*) adding first version of maximum filesize check for the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
06b1365066
*) fixed existing protection against divbyzero and removed the new one
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2530 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
94d7ced900
fix for last ranking commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2529 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
cc97a3e9c6
fixed possibly bug with indexOutOfBoundsException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2528 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
03835c2ee8
enhanced search result computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
809960ddc6
avoid division by zero
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2526 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ac3419b65f
better debugging for indexOutOfBoundException bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
75b03a4580
fix for new ArrayIndexOutOfBoundException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2524 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a8bc768206
enhancements to ranking evaluation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
a82e926c5d
*) fix for wrong totalPPM-calculation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2522 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
33898ae7e9
*) ResourceInfoFactory.java: Bugfix for classNotFoundException
...
See: http://www.yacy-forum.de/viewtopic.php?t=2797
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
406e170e25
*) more verbose error message
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b298474e22
*) Bugfix needed because of changed plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c2e6cc8c6b
small part of Bosts patch
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2517 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
96c6e4e322
- enhancements to detailed search page
...
- enhancements to search ranking computation process
- removed bugs in postranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
9340dbb501
fixed all possible problems with nullpointer exception for LURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a5ed86105b
*) bugfix for handling of ResourceInfo object in proxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
ff4362b02d
some more fixes for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
7aeadbe7cc
another NullPointerException in http.ResourceInfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
141f9e5bb4
fix for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1e7fd48afd
added size method to ftpc
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2508 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
087f7511f8
prevent NullPointerException in http.ResourceInfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a2525072f2
bugfix for kelondroRow - property generation
...
this bug affected ranking parameters :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hydrox
59a5511dbb
*) added missing static Strings as requested by theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2505 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
6578564c9a
*) Ignore more hop by hop http headers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b44514242a
*) crawler/ftp/CrawlWorker.java: better errorhandling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7d7f30139c
*) crawler/ftp/CrawlWorker.java: delete old cache file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4ae0f122f8
*) ResourceInfo.java: License header added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
043edfa4d8
*) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
...
*) ftp/CrawlWorker.java better errorhandling for ftp crawler
*) plasmaCrawlEURL.java: some errorcodes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4866868c0e
added write cache for LURLs
...
This was necessary to speed up the index receive process during global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8a0e35618b
enhancements to search result preparation
...
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5c1bb53d2a
Missing description for last commit
...
*) next step of restructuring for new crawlers
> HTCaching should now work protocol independent
-- introduction of new ResourceInfo objects containing protocolspecific metadata
of a resource.
-- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX,
shallStoreCacheForXXX in a protocol dependent manner
> Indexing should also work protocol independent now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
dae763d8e3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4825bfaaf3
*) Bugfix for PrintWriter Problem
...
See: http://www.yacy-forum.de/viewtopic.php?t=2792
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d4c5e2af01
html-dirlist can now also be generated from existing connections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2493 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7930839594
*) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
...
*) CrawlWorker.java: using new dirhtml function of ftpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
17ba468165
added html dirlisting generation in ftpc.java:
...
ftpc.dirhtml() generates a StringBuffer with a complete web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2491 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7a35b8e237
*) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ffbf416e76
*) direct access to requestheader of htCache.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3870d615e3
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
393a7d10be
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ab5a9bee66
*) adding some copyright headers
...
*) next step of restructuring for new crawlers
- adding first testversion of ftp crawler class
-- does not create a htCache entry yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5847492537
*) next step of restructuring for new crawlers
...
- IndexCreate_p.java: correcting problems with ftp urls
- URL.java does not cutout the userinfo anymore
(needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de)
- plasmaCrawlLoader.java:
-- hack to re enable https urls
-- adding function getSupportedProtocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6cce47e217
test of ftp-urls in URL class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2481 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fce9e7741b
*) next step of restructuring for new crawlers
...
- renaming of http specific crawler settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e3f0136606
*) next step of restructuring for new crawlers
...
- adding function isSupportedProcotol to plasmaCrawlLoader.java
- disabling robots.txt check for protocols other than http(s)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
9ded4e8d5a
*) Bugfix for name resolution in proxy mode
...
See: http://www.yacy-forum.de/viewtopic.php?p=25241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1c8300fcec
*) Bugfix for name resolution in proxy mode
...
See: http://www.yacy-forum.de/viewtopic.php?p=25241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4e2a950ac9
*) next step of restructuring for new crawlers
...
- avoid using the http crawler class directly. Using the interface class instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2476 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
09b106eb04
*) next step of restructuring for new crawlers
...
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads
- moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
- the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
eb9b138986
*) next step of restructuring for new crawlers
...
- conversion of the crawler pool into a keyed object pool
- crawlers are now loaded based on the url protocol (of course works only for http now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1395aae742
*) starting restructuring which is needed to add crawlers for additional protocols
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b4acbdaa97
*) better handling of server shutdown
...
See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2470 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
f3ac4dbbb9
*) better handling of server shutdown
...
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
959b779aba
*) avoid performance loss if log level is greater than 'fine'
...
See: http://www.yacy-forum.de/viewtopic.php?p=25180
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2467 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
57dda1a92c
*)again fixing for wrong version display, now totally working with double instead of float
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2464 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
479b74e1dd
*) fix for stupid mistake in new ppm-calc which caused decimal digits beeing written to seedinfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2463 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
348258a557
*) changed PPM-calculation to be much more accurate
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2461 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
18b6876860
new cache flush configuration settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
f0278b4092
Bugfix for / by zero when the AssortmentCluster is empty
...
See: http://www.yacy-forum.de/viewtopic.php?t=2746
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2459 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
14e0bb0dcf
allow more references per word for new db
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2458 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
985dcbde7f
changed some parameters that may cause better memory usage and more indexing speed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b7f4a1521b
added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c26da4893b
turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
db1eae0227
* simplified initialization of database objects
...
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
0b73f2b132
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
27a159b401
* documentation update
...
* removed doc from release
* release information in doc/News.html
* release 0.46
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2442 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
f80f776b89
*) Trying to solve NullpointerException problem in function addURLtoErrorDB
...
See: http://www.yacy-forum.de/viewtopic.php?t=2705
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2441 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d78b824e85
fixed problem with default path after first start-up
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2440 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
1c99b5a484
*)fixed logging for urldbcleanup
...
*)changed exception handling in urldbcleanup so that it shows NullPointerException correctly
*)added more Blacklisting to urlcleaner
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2436 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
135e019883
removed one superfluous line from last commit
...
(hasnot is included in remove)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2435 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1591a55963
added object cache miss-cache use for remove method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2434 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8f3f4ab0eb
enhanced synchronisation in plasmaWordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2433 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f933f00f09
another patch to URL protocol handling for 'news', 'nntp' etc:
...
reject it! (the java.net.URL class rejects them too)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2432 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4c6e00d80a
more bugfixes for URL class, see:
...
http://www.yacy-forum.de/viewtopic.php?p=24844#24844
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2431 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
23dd972608
fixed memory calculation in performanceMemory web page
...
fixed also maximum cache size computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b7dc251948
fixed bugs in url class:
...
- correct backpath ('..') handling
- correct absolute path handling
- included https
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2428 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1ce3c22761
better memory control:
...
- added memory monitor for preNURL-db in performanceMemory
- changed default memory assignments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
39b4c26bdc
more memory control:
...
- catchup of OutOfMemoryError in server threads
- automatic adoption of word cache size after a Short Mem Cycle
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2426 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3e9d509c39
some small fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2425 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
276225d79e
fix for URL class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2423 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eb633c0a4f
server threads must now supply a method that can be called in case
...
of short memory. This has been realized for the indexing thread.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2421 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f5720cb2fa
removed most synchronization in wordIndex (for testing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2420 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0187c60010
because of a bug in the JRE 1.4.2 there was no memory protection
...
see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4686462
this commit fixes the bug by using a memory-computation patch.
All uses of Runtime.maxMemory had been replaced by serverMemory.max
The bug is not present any more in Java 1.5
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2419 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
4eca0f8830
*) fixed PPM calculation for multiple indexer-threads
...
*) fixed totalPPM calculation and added total PPM to Network.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2418 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cfb51fdef1
less synchronization in plasmaWordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2416 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d6a928c2da
quickfix for http://www.yacy-forum.de/viewtopic.php?t=2705
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2415 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6ad471ef96
* applied many compiler warning recommendations
...
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
cf1186597b
utf fix from theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2412 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
9da3aa74d3
silly me, fix for the fix as advised by theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2408 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
bb3d9a5582
*) e.getMessage().indexOf() can only be used if there is actually an ExceptionMessage.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2407 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
7a54010a9c
*) Iterators can't be casted to IndexContainer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2406 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5e0b6f8f83
*) sorting peer name list on Blacklist_p.html
...
*) restructuring of sharedBlacklist_p.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2405 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cd5f7e137c
fixed problem with NURL-generation upon first startup
...
(a new kelondroFlexTable was generated, which should not)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8418af141a
added several consistency checks and small changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2400 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9d13aeca13
*) removing class. does not work so far
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2399 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
95a84ae469
*) adding missing classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2398 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
eee44be602
*) adding an interface for customized blacklist classes
...
- now it's possible to use a customized blacklist engine
instead of the default one
- this can be done by configuring the property BlackLists.class
See: http://www.yacy-forum.de/viewtopic.php?t=2108
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6d2f15971a
there is a very strange error that causes that the kelondroRecords structure
...
is corrupted. The cause is, that the deleted-records-chain has wrong entries,
and one of the pointers in that chain points to a place behind the file end.
This causes an IndexOutOfBoundsException within an IO operation.
I currently don't know the reason that the deleted-records-chain is
corrupted, but the error can be catched. If this now happens with the
assortment database, the database is deleted.
See also:
http://www.yacy-forum.de/viewtopic.php?p=24586#24586
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2396 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d2e8e76218
*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
...
See: http://www.yacy-forum.de/viewtopic.php?t=2541
http://www.yacy-forum.de/viewtopic.php?p=24516
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9ae9062bd3
* disabled new kelondroFlex table for NURLs
...
* added new RAM index Class
* fixed possible synchronization problem in kelondroRecords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2388 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
689bbcf9cd
replaced kelondroTree db for NURLs by new kelondroFlexTable
...
The new database is only created if the old is deleted or does not exist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7fbba41962
synchronization fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2386 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
328f9859a5
more synchronization in plasmaWordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2385 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f43c90fa98
fixed handling of null referer in crawlOrder
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
130e6d4719
generalized index object for eurl, nurl and lurl to prepare move
...
of these tables to new kelondroFlexTable Object
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
acdf24877f
more synchronization against outOfMemoryError in wordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2381 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
95160d7f2c
fixed size computation of index elements from the collection index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
26116cabde
added missing rowdef assignment
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2379 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cfbacbbf08
reverted change in robotsParser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2378 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
abf22f6e60
removed url normalform computation from htmlFilterContentScraper.
...
This method was implemented in de.anomic.net.URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
740d49751d
* strict type and size check in kelondroRow handling
...
* adopted all code to use the declaration form of kelondroRow
* fixed a bug in kelondroRow which caused wrong parsing of encoding type
* the bug caused bad database behaviour in new indexCollection data structure.
because of this bug, all test databases are now already void. A new database is created
* the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition
into a properties file along the database files.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
314021453f
* more logging
...
* option in yacy.init to set useCollectionIndex usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
a52f36787f
better templatedebugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2371 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
3480d36417
added some debug code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2369 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
61b151b083
* added another auto-fix for collection index inconsitency check
...
* fixed words size computation for collection index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2368 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0bbbd129ef
small fix for exception message
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2367 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
718fbc2dae
enhancements in kelondroCollectionIndex:
...
* synchronized array and index objects
* auto-fix function for slightly corrupted index entries
* generalized internal access methods
also extended kelondroIndex interface to support ordering access
which is used in kelondroCollectionIndex for string comparisments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2366 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f58283def2
better control of index flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4be21a3cab
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2363 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
80b6c90d54
enhancements to prevent blocking during dht transfer receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9f298083cd
*) adding more urls to the error url
...
- old error strings where replaced with there corresponding constants
See: http://www.yacy-forum.de/viewtopic.php?t=2638
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
d56f06401e
- Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
...
- Small logging updates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
c09f734d06
*) offer router configuration on ConfigBasic.html
...
- checkbox to allow router configuration is shown if
- a) the UPnP forwarder is installed
- b) a UPnP enabled router was found
- c) no other forwarder was configured
See: http://www.yacy-forum.de/viewtopic.php?p=24264
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
dcbb4d0a6b
Display the size of HashBlacklistedCache on PerformanceMemory page.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2357 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d799622da1
better flush limit for index collections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d468d665c9
some changes that may help to prevent deadlocks that cause an OutOfMemoryError
...
as described in
http://www.yacy-forum.de/viewtopic.php?p=24359
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2353 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d54767f634
*) last step of removing embedded html from dir class
...
- migration finished
*) dir list now sorts the dirlist entries.
- directories are listed before files
- files are sorted alphabetically, case insensitive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2351 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
279b1d969d
Integrated new indexing data structure 'collections' into the main class
...
for indexing, the plasmaWordIndex.
The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.
The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.
Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4ff742e42d
implemented indexCollectionRI
...
this is the new database structure that is supposed to replace the
plasmaAssortmentCluster AND the plasmaWordIndexFileCluster
The new structure is not yet active and needs to be integrated into
plasmaWordIndex. This has some migration constraints that are not yet
completely solved.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
01f95eccd3
re-write of kelondroCollectionIndex. This is the data structure that
...
shall replace the current assortment files.
* used the kelondroFlexTable to hold the index of collections
* used kelondroRow definitions to declare all data structures
* fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ebc2233092
* implemented (finished) class indexRowSetContainer
...
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9183d21f25
renamed new index class to old name
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c4e922885a
replaced indexURLEntry by new class that uses a kelondroRow.Entry object
...
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0b7112f8b2
fix for missing topLevelClone in indexRAMCacheRI.wordContainerIterator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2340 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e357599f92
* fixed problem with indexContainer iteration from RAM:
...
indexContainers from RAM must be cloned explicitely to prevent
side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
57fe5cc671
*) code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2338 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
4e9f02c8ec
integration of Michaels string-extraction.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2337 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8b77afd72c
some fixes to new container merger
...
and some code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
830167596a
bugfix for
...
http://www.yacy-forum.de/viewtopic.php?p=24127#24127
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2333 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
839806a775
*) serverPortForwardingUpnp.java: code cleanup, license header added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2332 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
03230cd887
*) removing old port forwarding classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2330 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
6e676224d0
*) adding support for upnp
...
A new port forwarding method for upnp was added.
If this method is enabled, yacy automatically determines an UPnP
capable internet gateway and configures the gateway port forwarding
settings properly.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
417ed5102e
redesign of database iterators:
...
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
0db237467f
*) bugfix for URL generation from file
...
see: http://www.yacy-forum.de/viewtopic.php?p=24116
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2326 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ad692fc6c7
implemented option to extract nurls from the database
...
(plus some iteration enhancements for nurls)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7fd90ca7c8
* strict handling of NURL entry element generation, storage and stacking
...
* more space for EURL reason strings (you must delete the EURL db to use this)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5f72be2a95
some redesign of EURL storage
...
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1ed3e2daef
added option to extract domains and/or urls from the eurl database
...
when extracting from eurl, the html output format is recommended, since
this format adds also the fail reason to the domain/url.
The complete syntax for domain extraction is now
java -Xmx<megabytes>m -classpath classes yacy -domlist [ -source { lurl | eurl } ] [ -format { text | zip | gzip | html } ] [ <path to DATA folder> ]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2322 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7e0a130fb5
new indexURLEntry class 'indexURLEntryNew', to replace old class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2321 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
58df8b7bbf
a large collection of different changes
...
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e20ff77c10
another bugfix in new url class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2318 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
685430a1b5
bugfix in new URL class, better loggin for domain extraction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2317 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
79af283f6c
better debugging in new URL class for wrong port numbers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2315 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
1b2ea58ee9
wrong substring invocation.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2313 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e4f1820b58
protection against too long authentication strings in switchboard
...
see also: http://www.yacy-forum.de/viewtopic.php?p=23943#23943
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2312 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b3f7e62e03
better handling of whitespace
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2311 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4149939c02
better handling of whitespace for gettext quotation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2310 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
97fa6788a1
added gettext support:
...
automatic replacement of string appearances in html files by
gettext quotes.
see also: http://www.yacy-forum.de/viewtopic.php?p=23901#23901
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2309 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago