- enhanced searchtime in kelondroRowSets
- enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4207 6c8d7289-2bf4-0310-a012-ef5d649a1542
- better handling of small collections (less overhead)
- usage of pre-sorted limits
- different re-sort limit
- more testing procedures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4200 6c8d7289-2bf4-0310-a012-ef5d649a1542
the enhancement was made by using better organized data structures and
multi-threading during the sort. A sort can be divided into two separate
processes when the first partition of the quicksort algorithm was done.
Generating a separate thread and starting the thread takes only 10 milliseconds,
so using a separate thread makes only sense if the data amount is large.
statistics about the speed-up:
without ehancement: 250 milliseconds for 100000 entries
with data structure enhancement: 170 milliseconds for 100000 entries
with additional second thread (if second processor is present): 130 milliseconds.
For dual-processor systems, this means about 100% speed-up
a test can be made with the following command:
java -classpath classes de.anomic.kelondro.kelondroRowCollection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4198 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Array element shifting during remove is only done when it is necessary to keep the order of a row collection.
- This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4158 6c8d7289-2bf4-0310-a012-ef5d649a1542
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
- snippets are not fetched by browser using ajax, they are now fetched internally
- YaCy-internat threads control existence of snippets and sort out bad results
- search results are prepared using SSI includes
- the search result page is visible right after the search request, the results drop in when they are detected
- no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers
- added result page switching! after the first 10 results, the next page can be retrieved
- number of remote results is updated online on the result page as they drop in
- removed old snippet servelet (which had been also a security leak btw)
- media search is broken now, will be redesigned and fixed in another step
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542
- refactoring of kelondroRecords; this class is now divided into
kelondroAbstractRecords, kelondroRecords, kelondroCachedRecords, kelondroHandle and kelondroNode
- better abstraction of kelondroNodes, such nodes may now be crated by different classes
- a new Node defining class kelondroEcoRecords defines Nodes that do not need so much allocation and System.arraycopy
- there is less memory transfer on the bus, especially for collection index
- now half of memory needed for web index access
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4024 6c8d7289-2bf4-0310-a012-ef5d649a1542
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
target servers had been able to see search words from the referrer of the favicon fetch.
This has been removed by using the getImage - servlet for favicon fetch.
Since java does not support loading of bmp and ico-Images, such parsers had been added.
The image parser had been coded from their original microsoft documentation.
This influences also the image-search functionality: there can now be a preview
of found bmp-images. Another benefit: favicons for search results are now cached with the HTCACHE.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3965 6c8d7289-2bf4-0310-a012-ef5d649a1542
- fix for missing restart script in ant built target
- removed some more synchronization for size() operations
- removed blocking statement on search page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3935 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
we will provide two different releases in the future, one standard release and one 'pro'-release.
the 'pro'-release contains all additional parsers AND has different default performance values.
The pro-version differs therefore from the previous 'all'-version by this default values.
The pro-configuration is automatically choosen if the libx-folder exists. If a version is once initialized, its configuration stays independently from an existing libx folder.
The ant targets had been changed. There are now 3 different targets to create standard and pro-releases, and one target to upgrade:
- dist: creates a standard release (only, no libx target any more)
- distPro: creates a pro-release (includes the libx)
- distExt: creates a libx-release which includes the libx-folder only. It may be used to upgrade from standard to pro
Furthermore, the naming of 'dev'-releases had been removed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3902 6c8d7289-2bf4-0310-a012-ef5d649a1542
this stores now two index structures, one for data that is aquired during start-up
and one for data that is aquired during run-time. This reduces the grow factor, and should reduce the memory amount in case that a index-reorganisation happens.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3733 6c8d7289-2bf4-0310-a012-ef5d649a1542