yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	b887f4a116	keep more free mem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5778 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c2359f20dd	refactoring: better abstraction of reference and metadata prototypes. This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index. Moved to version 0.74 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ab656687d7	more strict BLOB initialization .. may also help to save some ram git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5776 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f6691411b5	- migration of files from SplitTable (which are used for the URL-DB) to a different file name format. - the file generation logic is slightly different: files may now have only a maximum size of one gigabyte and a maximum age of one month. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5773 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f21a8c9e9c	a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5771 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7ba078daa1	- added fast site-operator - refactoring merge into BLOBArray git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b4126432bc	hardening of index dump write process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5769 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9bfb2641db	- removed deprecated threads - added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0139988c04	- added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up. - added clean-up of unfinished merges and unused idx/gap files - enhanced merge file selection method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5764 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3621aa96ab	- added a memory protection for the IndexCell migration - fix for bad cell file selection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5763 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	568e8f1741	fix in unmountBLOB git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5762 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9da69d6b68	- better selection of files to be merged - fix for getChannel().close(), which works on windows but not on macs and linux git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d39a5b42ca	more care about open file handles. Now files also close on windows and can be deleted afterwards. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	029495e64d	fixed bug introduced in SVN 5756 in EcoTable.put() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5759 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	587838bd09	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d2e2420a68	- added another file selection method for index cell merge - more hacks to check that files are closed propertly and filehandles do not exist after files are closed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5757 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	96eaecda3e	- added migration class to go from index collections to the index cell data structure. - added better control over file deletion, because this sometimes fails, especially on windows git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0f0b4aec75	better index cell merge logic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5754 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fa07234d4e	fix for clear method: now deletes files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5752 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	c450e3746b	svn attributes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5736 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	37f892b988	added new concurrent merger class for IndexCell RWI data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5735 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	67aaffc0a2	- added Latency control to the crawler: because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases). The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time. - added API to monitor the latency times of the crawler: a new api at /api/latency_p.xml returns the current response times of domains, the time when the domain was accessed by the crawler the last time and many more attributes. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5733 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0926310461	another performance hack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5731 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ebe5d69d14	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5730 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b3f75e48fa	- enhanced balancer: auto-solving of waiting-deadlocks - removed deprecated cache-init size value - more debug lines for IndexCell cache dump merge git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9a90ea05e0	added a merge operation for IndexCell data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5727 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d99ff745aa	fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5726 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0c3ab291c4	fix for http://forum.yacy-websuche.de/viewtopic.php?p=13354#p13354 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5725 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	a9cea419ef	Integration of the new index data structure IndexCell This is the start of a testing phase for IndexCell data structure which will replace the collections and caching strategy. IndexCall creation and maintenance is fast, has no caching overhead, very low IO load and is the basis for the next data structure, index segments. IndexCell files are stored at DATA/<network>/TEXT/RICELL With this commit still the old data structures are used, until a flag in yacy.conf is set. To switch to the new data structure, set useCell = true in yacy.conf. Then you will have no access any more to TEXT/RICACHE and TEXT/RICOLLECTION This code is still bleeding-edge development. Please do not use the new data structure for production now. Future versions may have changed data types, or other storage locations. The next main release will have a migration feature for old data structures. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5724 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	83792d9233	more refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	474aac65af	more refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5719 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	209f25f5f5	refactoring to integrate indexCell data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5718 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b7138e5fcb	even more efficient comparator calls (less System.arraycopy for primary keys) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5715 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	65784eb656	- more efficient comparator calls - fix for http://forum.yacy-websuche.de/viewtopic.php?p=13331#p13331 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5714 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	44874cb550	added a deleteOnExit for blob file deletion in case that a deletion is not successful. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5713 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	66f78d67e0	bad idea. Concurrency in index management will be done differently git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5712 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7dff1cba62	removed option to use different primary keys in kelondro tables this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5711 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7f67238f8b	refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d49238a637	more performance hacks: better default values for scaling, less memory usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5708 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	39644dc14e	performance hacks to compare methods in database core git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5707 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e2e7949feb	replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f6d989aa04	added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy. The speed of the kelondro indexing class ObjectIndexCache can be compared with Javas standard TreeMap with the main method in IntegerHandleIndex. The result is, that the kelondro indexing needs only 1/5 of the memory that TreeMap uses! In exchange, the kelondro classes are slower than TreeMap, about four (!) times slower. However, this is not so bad because the better use of the memory is a strong advantage and makes it possible that YaCy can maintain such a large number of document (> 50 million) in one peer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5705 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6958eff196	removed unnecessary exceptions, extended testing in IntegerHandleIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5701 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	13c666adef	performance hack to ObjectIndex put() method: Java standard classes provide a Map Interface, that has a put() method that returns the object that was replaced by the object that was the argument of the put call. The kelondro ObjectIndex defined a put method in the same way, that means it also returned the previous value of the Entry object before the put call. However, this value was not used by the calling code in the most cases. Omitting a return of the previous value would cause some performance benefit. This change implements a put method that does not return the previous value to reflect the common use. Omitting the return of previous values will cause some benefit in performance. The functionality to get the previous value is still maintained, and provided with a new 'replace' method. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5700 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1f1be1518c	added stub for another performance hack: concurrent indexes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5699 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3e4c28e188	enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5698 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	84e37387a2	fix for last commit and more testing stub git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5697 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ca006c506d	stub for performance enhancements for RowSet (no functional change yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5696 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	100247bdda	added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following: java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -export DATA/INDEX/freeworld/TEXT xml urls.xml diffurlcol.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -delete DATA/INDEX/freeworld/TEXT diffurlcol.dump The export-feature is optional, the purpose of that function is to provide a back-up function for URLs to be deleted. The export function can also be used to create html files with embedded links and simple text-files. Simply replace the 'xml' word with 'html' or 'text'. The last argument in the cann, the diffurlcol.dump value, can also be omitted. This will cause that the complete URL database is exported. This is an alternative to the Web-Interface based export function. The delete-feature is the only destructive method of the four presented here. Please use it with care. It is better to make a back-up of the url database files before starting the deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5694 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	60078cf322	added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS to use this, you must user the -incollection command before (see SVN 5687) and you need a used.dump file that has been produced with that process. Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump or use different names for the dump files or more memory. As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections. The file has the format {hash-12}* that means: 12 byte long hashes are listed without any separation. The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dbdd10da84	better logging and startup behaviour for referenceHash computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5690 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d64836c34f	added statistical analysis of URL reference use that with the following command on a linux shell: java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump for freeworld indexes. For more details please see discussion below: http://forum.yacy-websuche.de/viewtopic.php?p=13204#p13204 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5687 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3b28daab40	code-beautification (to be consistent with external documentation paper) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5686 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	485c9406e5	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1915&hilit=&p=13249#p13249 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5684 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b80db04667	- refactoring of IntegerHandleIndex and LongHandleIndex (better method names) - fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled - more memory for DHT selection required - stub for URL reference hash statistics in index collections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	16f5c6a85e	fixed merge method initialization in ReferenceContainer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5676 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d7a493b4f5	added experimental timeline api git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5672 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	efcd95dc37	simplification of (internal) query process / refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d4b56d5819	added more asserts to BLOBHeap.flushBuffer() to fix the problem described in http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1679&hilit=&p=13109#p13109 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5666 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	aa44d9bad9	more refactoring of kelondro.text / deleted de.anomic.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6ffc6e3389	more refactoring of indexer and kelondro classes; - integrating the indexer into kelondro as package 'text' - renaming of classes in kelondro.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	2df57b1fd1	refactoring of index collection class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8444357291	added new row interator in kelondro tables files that enumerates rows without an order by the primary key. The result is a very fast enumeration of the Eco table data structure. Other table data types are not affected. The new enumerator is used for the url export function that can be accessed from the online interface (Index Administration -> URL References -> Export). This export should now be much faster, if all url database files are from type Eco The new enumeration is also used at other functions in YaCy, i.e. the initialization of the crawl balancer and the initialization of YaCy News. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5647 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	62505bb3cb	more bugfixes as recommendet by findbugs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6b450d09ca	some fixes recommended by findbugs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5618 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e04a0e05c3	fix for last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5614 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	a9ad863686	second part of 'doubles' fix - better handling of doubles in RAMIndex. More logging. still missing: deletion of double entries in collections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5613 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	59427064fb	first part of 'doubles' fix (not fully ready yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5612 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	26978b2a25	- better memory protection in kelondro caches: computation of needed memory for cache grow - removed excessive gc calls - step to 16 vertical DHT partitions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5611 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
hermens	2173865f92	Prevent race condition when switching timezones. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5605 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	30a1de41b3	disabled the BufferedIOChunks, because I consider it as broken. I will try to fix that, but it is better to not use a buffer than using a broken buffer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5600 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	411f2212f2	more memory leak fixing hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5599 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	333489420b	- fix for NPE when loading the cytag image - some hacks for less memory usage: -- less usage of buffer and cache memory in EcoFS -- buffer allocation on-demand in BufferedIOChunks -- removed largest ybr idx git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5595 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c25c334b75	replaced old DHT transmission method with new method. Many things have changed! some of them: - after a index selection is made, the index is splitted into its vertical components - from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue - each splitted chunk gets its own transmission thread - multiple transmission threads are started concurrently - the process can be monitored with the blocking queue servlet To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed. The new index distribution model using a vertical DHT was implemented. An abstraction of this model is implemented in the new dht package as interface. The freeworld network has now a configuration of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free. This modification has three main targets: - enhance the DHT transmission speed - with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times. - the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before. with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs. BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	01b97ef3f8	added new cybertag-tracking feature that was inspired by itgrl from the forum discussion in http://forum.yacy-websuche.de/viewtopic.php?p=12612#p12612 The feature will provide two basic entities: - you can integrate image links which point to your yacy installation anywhere in the web. the image can be loaded with <img src="http://<yourpeer>:<yourport>/cytag.png?icon=invisible&nick=<yournickname_or_community_id>&tag=<anything>"> This will place a invisible 1-pixel image. If you change the icon=invisible to icon=redpill, you will see a red pill Use this, to track your activity in the web. - you can view your tracks at http://localhost:8080/Tracks.html - There is a public api to your tracks at http://localhost:8080/api/tracks_p.json which needs authentication git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5581 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	b19bc611b0	gc: better logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5578 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b1f9c00118	fix for bug in merge operator initialization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5577 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b57c9da1f8	- fixes to doc, ppt, xls parser: better title - fixes to httpd server response header generation - fixes to a server date computation bug - new Button in indexControl to view content of url in ViewFile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5576 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	7936e58fe7	* sorry,previous version didn't compile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5575 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	76cdc59789	* added some convertions to and from UTF-8 * this might fix problems on windows systems (like http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1824) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5574 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	94110df85a	moved logging partially to kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	024da2916b	refactoring of logging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	83ce65707a	(almost) completed partition of classes in kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7ee494fde5	more refactoring of kelondro: - seperated BLOB from table classes - renamed 'coding' package to 'order' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bf93767ec6	refactoring of kelondro database classes (to be continued) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fc27bf8c4c	refactoring of kelondro classes: kelondro shall become independent from other packages. moved bytebuffer, date and memory to kelondro git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6cbca1e508	extended last fix, preventing more sorts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5533 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f9672d3f97	applied fix for inefficient put method as recommended by celle, see http://forum.yacy-websuche.de/viewtopic.php?p=12424#p12424 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5532 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3154926311	some better memory protection and OOM prevention in EcoFS git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5523 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dedfc7df7f	removed distinction between DHT-in and DHT-out. This is necessary to make room for the new cell data structure, which cannot use this this distinction in the first place, but will enable the same meaning with different mechanisms (segments, later) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5511 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b74159feb8	preparations to integrate the new 'cell' index data structure (this commit is just to move development files to my other computer, no functionality change so far) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5509 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	cb76d9e0e4	more synchronized in BLOBHeap (will not fix problem with Runtime-Error as reported in forum) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5487 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f675d47f86	better protection against database failures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5463 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4d5b401f00	try to fix some performance problems with the internal index management: - ensuring that ordered indexes stay ordered during remove - no unnecessary ordering checks - better test logic in crawl stacker git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5457 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c6880ce28b	removed the permanent cache flush and replaced it with a periodic cache flush The cache is now flushed only for one second every ten seconds. During a crawl the cache fills up completely, and is only flushed if space is needed for more documents. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5446 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ef7fe537c5	fixed a cache-bug in cachedFileRA git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5445 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6c7e83909b	- refactoring of data access methods to be prepared for new cell data structure - removed a memory overhead in collections which prevent OOM Exception in low memory configurations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5443 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	07fc115e90	removed active profiling in kelondroRowSet git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5433 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	be4c458951	refactoring (implemented Iterable in kelondroRowCollection) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5432 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b6bba18c37	replaced the storing procedure for the index ram cache with a method that generates BLOBHeap-compatible dumps this is a migration step to support a new method to store the web index, which will also based on the same data structure. made also a lot of refactoring for a better structuring of the BLOBHeap class. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5430 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3567c58b18	added another filed information for BLOBHeap dumps: the gaps git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5420 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	abdd4aa414	added a index dump for blob heaps: this will increase the shutdown time for at most some seconds, but will speed up the start-up git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5419 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8c3205b62e	fix for OOB Exception see http://forum.yacy-websuche.de/viewtopic.php?p=11598#p11598 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5417 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e004da48d3	- added fast fingerprint computation for files (any). Will be used in new index dump method - refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fc8189f3fb	better self-healing of corrupted databases git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5406 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f29b48d9ff	patch for IndexOutOfBoundsException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5399 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8cb7170b75	- set status of kelondroTree, kelondroBLOBTree and kelondroFlexTable to deprecated - removed initialization and/or usage of kelondroFlexTable (should meanwhile not be used any more) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5396 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7cd08bd5fb	fix for NPE in BLOBCompressor git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5388 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5b94498643	fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5387 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1779c3c507	- added a read cache to the RAFile interface to RandomAccessFile - added a write buffer to BLOBHeap - modified the BLOBBuffer (is now only to buffer non-compressed content) - added content compression to the HTCache The new read cache will decrease the start/initialization time of BLOB files, like the HTCache, RobotsTxt and other BLOBHeap structures. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e1acdb952c	fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5385 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4a2dac659e	more speed hacks: - modified and activated write buffer - increased cache flush factor - fixed a problem with deadlocking of indexing process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	47292e696a	more performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	759cef23dd	fix for bug in kelondroAbstractRA.readFully git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5378 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d39d420b39	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	513179f404	changed interface to colletctionIndex and adopted all implementing classes: do not return a result of a double-check when adding entries with addUnique git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9d64693cfb	reverting again the changes to new concurrent chunkIterator git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5362 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	45ad1c3dd5	- re-activated concurrent iterator for EcoFiles - added javadoc for new concurrent intialization in kelondroBytesLongMap - switched default value for commons storage to false - version step git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5361 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	2e2120046f	speed enhancement for BLOBHeap opening process using concurrency of FileIO and content processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5360 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	10f5ec1040	reverted last commit (more testing needed) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b0f2003792	fast database initialization and fast start.up of yacy: - applied knowledge about concurrent files stream reading and index processing from the wikimedia reader to the EcoTable initialization process: the file reader is now concurrent to the index generation - changed also some initialization processes to avoid some pauses during initialization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ef66438662	- more space in error db to store larger error messages - added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5329 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d014b2728a	Design-check, Extension and Refactoring of DHT target position computation: - two different computations (but mathematical equivalent) of the DHT distance had been consolidated - moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets - added fast Long - to - hash computation - high-precision target computation of gaps for new peers - added new target computation for horizontal and vertical DHT targets (not yet in use) - old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dd27ce7216	added control logic to ECO tables that deletes ram copies of the tables if they get too large table copies in ram are now abandoned if less than 20 MB ram is left git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5317 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	38e6ba5d00	forgot to re-rename commonsPath git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5316 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	22989d0d8a	added property index.storeCommons to switch commons storage on or off with index.storeCommons=false all currently stored commons are deleted! Default is now 'true', but in future full releases it will be switched to 'false' git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5315 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
danielr	103ad2a437	some javadoc git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5299 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6941bf42b1	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9b0c4b1063	redesign of parts of the new BLOB buffer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5287 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1778fb420d	- added some performance tweaks to the new BLOB buffer - removed the now superfluous HT storage thread - reduced number of file decompression by shifting the compression moment to the future git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9663e61449	added another class to handle BLOB writings to the new HTCACHE data storage: - entries are buffered and written as stream with many entries at once (saves many IO accesses) - entries are compressed with gzip: increases capacity of cache - concurrency for stream-writing and compression: all writings to the cache are non-blocking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5284 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	826ca79735	refactoring and new architecture to store the files of the web cache: - files are not stored any more as individual files - a new database structure using BLOBHeap files stores many cache entries in common files - all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	998861acfd	- some refactoring in BLOBHeap to enable more gap processing functions - better gap merging in BLOBHeap - shrinking of heap file if gap is at end of file when file is closed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5268 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	766cad6e93	enhancement in memory management of BLOB Heap files / merging of deleted entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5266 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7860d5d632	fix for bug in seed list management (cause was bad class overloading, only visual effects!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5265 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ffed5fc415	fixed problem with lost peers in database migrated seedDB from BLOBTree to BLOBHeap git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5263 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6fb865fbdc	- fix of bug in iterator in kelondroBLOBHeap which caused bug in crawl profile listing - some refactoring of classes that use kelondroMap (Map instead of HashMap) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5262 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9ac16f565b	- fixed several bugs in database management functions - fixed a display bug for the performance graph - fixed deadlock when initialization of awt happens simultanously - removed some debugging output git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	e1f67262f7	- added and removed some debugging output - fixed a bug with merge method - patched wrong output of language identification (not fixed, only patched!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5181 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	25a62cdc3f	small fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5161 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1eb813bd43	shifted index deletion-on-exit rule to the class where the errors are produced git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5141 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	0bb4fbc403	delete corrupted collecion.index on exit for rebuild on next start see http://forum.yacy-websuche.de/viewtopic.php?p=9725#p9725 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5135 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	77ee0765a4	- added domain statistic generation to IndexControlURLs_p.html servlet - added 'delete all' button to all results of such a domain statistic output which causes that all urls to this domain are deleted - extended stack cleaner to clean also the statistics: they are not completely destroyed, only the smallest counting domains are removed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5117 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
lotus	e645bae29f	display table in log git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5106 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ead39064c5	fixed problem with wrong result number calculation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5105 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
hermens	2437beb96c	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1360&p=9321#p9321 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5104 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7b12e77a63	fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1393&hilit=&p=9655#p9655 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5103 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	05dbba4bab	added logging conditions to all fine and finest log line calls this will prevent an overhead for the generation of the log lines in case that they then are not printed git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d3d41e2ee4	- fixed problem with searching with quotes (still not complete, but not as bad as before) - fixed parsing of crawl-delay statements when seconds were given with float numbers - enhanced performance of profiling (not too many loggings; not more than one per second) - removed some debug output - fixed wrong return type in logging - added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!) - fixed wrong word distance computation in RWI management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	3c68905540	remove redundant null checks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5065 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago

1 2 3 4 5 ...

853 Commits (a71bb7178d5c173892dcac1b86de0f246f6dd442)