At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code).
NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4162 6c8d7289-2bf4-0310-a012-ef5d649a1542
- URL encoding for search terms where required
- removed "ugly" CDATA escaping
- UTF-8 encoding for the XML
- no HTML style escaping for XML/RSS element values
Note: some unicode characters might still be encooded in a wrong way.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4140 6c8d7289-2bf4-0310-a012-ef5d649a1542
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown
attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
1. avoid adding duplicate file name entries in config properties for lists,
2. correctly merge all path masks from all list files for the same host masks,
3. rewrite helper methods standard java methods for Collection transformations,
4. merged various methods with identical functionality for different Collection implementations into one,
5. minor refactoring to improve code readability.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4087 6c8d7289-2bf4-0310-a012-ef5d649a1542
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
- images may be requested by localhost and authorized users only, if the request is done using a clear-text URL
- the image may be requested also using a code that can be a license to retrieve a URL for everyone
- some servelets produce URL licenses for ViewImage, like image search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4027 6c8d7289-2bf4-0310-a012-ef5d649a1542
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
- within this context: generalized date format handling
- extended Update interface:
* a version lookup can be triggered manually
* a complete lookup + download + re-boot process can be triggered with one click
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed old rss parser
- removed unused rss parser libraries
- added new rss reader
- added previously removed FeedReader_p.java and adopted it to new rss parser
- adopted parser interface for rss indexing to new rss parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3970 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
- don't start up if DATA/yacy.running exists as this is usually a sign of an already started yacy-instance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3831 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fixed wrong XML output if no peers are known in Network.xml
- simplified parsing of table properties in wikiCode and ZTableToken
- reimplemented GC heuristics. They are needed to constantly ensure that an amount of free memory is available which is higher than Java's max. limit for performing a Full GC (please use serverMemory.request(long, boolean) rather than serverMemory.available(long, boolean) to provide data for averaging over the last GCs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3793 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) First version of a sitemap parser added
- currently only autodetection of sitemap files is supported
*) DB-Import restructured
- pause/resume should work again now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542
- cluster definitions can now contain an addition for local ip addresses
- cluster-cluster communication uses the local ip address instead the global address, if one is given
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3624 6c8d7289-2bf4-0310-a012-ef5d649a1542
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542