yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	21fbca0410	better scaling of HEAP dump writer for small memory configurations; should prevent OOMs during cache dumps git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5920 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6e0b57284d	better care for states of the IODispatcher git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5919 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1db9cdd4e4	fixed bug in writing of robots.txt entries in case that host names exceeded 64 characters and some other problems git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5918 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	bde88b684a	* splitt off yacyRelease from yacyVersion * added some gui infos about signatures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5916 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	057ce14c8e	more fixes (character encoding, parser exceptions, http client failure, blob writing) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5914 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d2ac0aa682	- fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling - increased default memory size to 180MB - fixed possible bug in http client reset (there was a deadlock) - bug in BOBHeap marked, but not solved, cause is still unknown. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5912 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	1351d903a1	don't follow links like mailto: git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5909 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e88a66bcae	temporary disabling computation of all sublinks (check needed) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5908 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
low012	ff5f82d780	) removed description of removed commands from wikiHelp ([= =]) ) used format function of Netbeans for wikiCode to make it more readable, no functional changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5907 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	eacf95213a	fix for crawling of mailto-links git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5906 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9c6ac43f66	fixes for wiki parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5905 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3a64c9d02f	- fix for problem with concurrency when computing word hashes - fix for search in case that a urlfilter was used and zero results were returned git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5904 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d3f8aa5a2a	set of small fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5903 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
low012	78ffb61297	*) got rid of unnecessary variable which might also fix IndexOutOfBoundsException git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5902 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d31e6f9c14	fix for http://forum.yacy-websuche.de/viewtopic.php?p=14457#p14457 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5899 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8d6212233b	fix for IODispatcher git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5896 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f678472f46	fix for quote problem in json output git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5895 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d079d6dfdb	small changes in surrogate reader, wiki code and portal test git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5894 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	07f09742bb	set of small fixes and comments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5893 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	06ed4ef7b3	* better picture handling git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5891 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5a634cab23	removed generation of anchor link sets in document types that describe container formats. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5890 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
low012	f1244264b8	*) hopefully fixed bug reported in http://forum.yacy-websuche.de/viewtopic.php?t=2057 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5882 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	2e3186189b	fix for mediawikiIndex surrogate producer + added concurrency git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
apfelmaennchen	6f5ea7b1a8	small fix for previous post git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5879 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
apfelmaennchen	138a0747e3	added serverObjects.putJSON as JSON has very particulare encoding requirements git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5877 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d977dd9a96	fix for surrogate loader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5870 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9cb68353da	fix for bug in ProfilingGraph for ppm >> 10000 ppm (!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5868 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9e4db75aac	reduced internal logging and reduced memory that internal logging can use git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5867 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c10c257255	attempt to fix a deadlock situation where the IODispatcher did not work. I suspect the dispatcher thread has crashed and queues filled so no indexing process was able to write data. This fix tries to heal the problem, but I am unsure if it helps. To get a better view of the problem, some more log outputs had been inserted. Added also a new attribut indexer.threads to get a control over the number of default threads for the indexer (default is 1) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5866 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	09987e93fd	fixed some more bad handling of byte[] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5865 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1bcc1450cb	more explaining error message in case of IOExceptions during html parsing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5864 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fe51f4d668	less synchronization may help to prevent deadlocks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5863 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	58802e4201	added missing success test in storeDocumentIndex, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1922&hilit= git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5862 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	171e62bee5	addition to the fix from last commit (which did not work) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5860 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	059949a0d1	tried to fix problem with snippet fetch for second search page when verify=false git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5859 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	b08991e278	moved some constants, rename of Tray class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5858 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	138422990a	- removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated - added some debugging output to balancer to find a bug - removed unused classes for index collection handling - changed some default values for the process handling: more memory needed to prevent OOM git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5856 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1b9e532c87	some concurrency for wikipedia dump reader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	25d2160288	small fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5853 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	16baa7ad24	To translate a mediawiki dump into the YaCy surrogate format do the following: - download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2 from http://download.wikimedia.org/dewiki/20090311/ - move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/ - start the conversion; open a command shell, move to the yacy home directory and execute java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/ this generates a series of files to DATA/SURROGATES/in if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0b2c98edc9	some more work on the wikipedia-dump exporter (not finished yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5195c94838	two patches for performance enhancements of the index handover process from documents to the index cache: - one word prototype is generated for each document, that is re-used when a specific word is stored. - the index cache uses now ByteArray objects to reference to the RWI instead of byte[]. This enhances access to the the map that stores the cache. To dump the cache to the FS, the content must be sorted, but sorting takes less time than maintenance of a sorted map during caching. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5849 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9416f5c26f	more speed test cases: kelondro provides map functions that are more than 20% faster than standard java classes and use less than halve of the memory of java classes: just start IndexTest (here with 1000000 test objects) Performance test: comparing HashMap, TreeMap and kelondroRow generated 1000000 test data entries STANDARD JAVA CLASS MAPS sorted map time for TreeMap<byte[]> generation: 2110 time for TreeMap<byte[]> test: 2516, 0 bugs memory for TreeMap<byte[]>: 29 MB unsorted map time for HashMap<String> generation: 1157 time for HashMap<String> test: 1516, 0 bugs memory for HashMap<String>: 61 MB KELONDRO-ENHANCED MAPS sorted map time for kelondroMap<byte[]> generation: 1781 time for kelondroMap<byte[]> test: 2452, 0 bugs memory for kelondroMap<byte[]>: 15 MB unsorted map time for HashMap<ByteArray> generation: 828 time for HashMap<ByteArray> test: 953, 0 bugs memory for HashMap<ByteArray>: 9 MB git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5847 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b53790abb1	more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5846 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8ffb9889e1	some fixes and performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5845 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dfb96ecb72	more fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1b8d346b4c	fixes in connection with transiton to byte[] hashes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	0b0a46d35a	* fix transferRWI as suggested by celle (thanks!) see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2000#p14023 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5842 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	996572de95	quickfix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	380ed2dac0	performance and debugging additions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5840 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	635b0a9da7	code-split allow cgi indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5839 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fa3adbbfc6	added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5837 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	76af84d732	* add custom comparator to ScoreCluster for byte[] * fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2010 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5836 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	ab0030d7a7	allow dht-out for remote-crawl processing peers on default settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5834 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
low012	d1116c049f	) added new method "contains()" to Blacklist interface ) implemented contains() in class AbstractBlacklist *) used new method in Blacklist_p to prevent double entries in blacklists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5832 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	08445e42f0	* don't throw exception, in case of bad charset in http-header git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5831 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	2f860a2564	* convert byte[] hashes to string for log output git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5830 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	d93a2a6552	* ignore whitespaces so you can copy&paste signatures better git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fbcbcc5bdb	export of yacy document objects as dublin core record in xml git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d7cbf4cdd4	more performance hacks: less overhead in word hash computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5825 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	29e96c1a60	bugfixes and performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5824 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4e97a31009	corrections in dublin core syntax git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5823 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	44daec7936	* introduce signatures to autoupdate as long as there aren't publickeys for the updatelocations set, no signatures are checked * wiki-article follows... git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	538e375901	replaced old caching method for computed word hashes with a better method. The word hash computation is a new performance bottleneck (after the IO bottleneck was removed with the IndexCell data structure) and a better caching for word hashes was necessary. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5821 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9e853e1977	partly reverting SVN 5818: identical comparator required for join operator git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5820 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e16c25ddf7	(peak-) performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5819 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	63cd152969	fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5818 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7dfe7e7cc6	fixed some problems with surrogate reader. This is now ready for testing. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5817 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3a1364ed5c	removed example lines from SurrogateReader sources; added additional example file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5816 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9050a3c4c5	alpha version of surrogate reading and indexing. see the example file for an explanation. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5815 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b15b059c0d	fix for latest commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5813 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c8624903c6	full redesign of index access data model: terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash. this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before. Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	dd6b5005ff	* fix missing charset handling in getpageinfo_p git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5811 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bd5f4c78d8	- added default profile for surrogate indexing - integrated surrogate indexing into indexing queue process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ad78e3a59f	- less lines in rssTerminal - crawl more documents: if remote crawling is enabled, a remote crawl list is also loaded if a local crawl is running in case that the indexer is idle git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5809 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bc80dc913a	added new surrogate reader (surrogates are parsed documents on batches) this will open a new way to insert indexes to YaCy (instead crawling) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5808 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	12d81e98eb	- fixed bad search results when searching for empty string - simplified result handling and page composition in case that nothing was searched git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5807 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8a24350036	- fix for join method with new generalized RWI data structure (caused by latest commit) - added more functions to mediawiki parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e58320a507	added more info in log fore debugging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5805 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	89ec3acb3e	- full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes. - during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	7a48090fcf	- fix for "uk" language - svn attributes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5803 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dc2af61bc9	allow up to 50 results from remote peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5802 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c0e8ed5461	fixed problem with not http client git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5801 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8862a2fed0	ups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5799 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	de68948bc5	better handling of free memory computation and emrgency cache flush for index cell git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5798 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	fcb77c3140	* added .im (Isle of Man) to TLD-list git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5794 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b81c7467d8	protection against too many files in RICELL in case of massive emergency dumps caused by low memory git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5791 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d4d87d90c4	- extended experimental wikipedia dump parser - removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c3aff2521e	fix for NPE git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5789 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	57c00dd8c9	fix for bad filtering of common http error git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5788 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	14361f1ca4	added log message for index generation in HeapReader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5787 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c08f9b36a4	refactoring of wiki parser. This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	44e01afa5b	- refactoring - a little bit more abstraction - new interfaces for index abstraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5783 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	82fb60a720	increased memory limit for emergency cache flush git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5782 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
low012	9180617dd9	*) Classes to handle import of lists (especially blacklists) from XML files, not used yet, but will be used soon. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5780 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	596e6215dc	fix in case of white space in path name git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5779 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b887f4a116	keep more free mem git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5778 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c2359f20dd	refactoring: better abstraction of reference and metadata prototypes. This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index. Moved to version 0.74 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ab656687d7	more strict BLOB initialization .. may also help to save some ram git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5776 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5b138ada16	fixes to web structure reference collection and url construction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5775 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	a29a11e526	added evaluation of incoming links in webstructure api the api hash changed, new XML schema. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5774 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f6691411b5	- migration of files from SplitTable (which are used for the URL-DB) to a different file name format. - the file generation logic is slightly different: files may now have only a maximum size of one gigabyte and a maximum age of one month. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5773 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
shostakovich	1f37cc6107	Robots.txt is now reused after one day. See forum-topic: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1669&p=13565#p13565 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5772 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f21a8c9e9c	a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5771 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7ba078daa1	- added fast site-operator - refactoring merge into BLOBArray git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b4126432bc	hardening of index dump write process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5769 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9bfb2641db	- removed deprecated threads - added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	293290c317	fix for bad assert in last commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5767 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bd409fb7ba	added web structure analysis for a special domain that can be requested from the api. Example: http://localhost:8080/api/webstructure.xml?about=www.yacy.net returns a xml with the following content: <?xml version="1.0"?> <webstructure> <domains reference="reverse" count="1" maxref="300"> <domain host="www.yacy.net" id="FXg39Q" date="20090401"> <citation host="java.sun.com" id="o-R3yY" count="1" /> <citation host="yacy-suche.de" id="-KCLaB" count="1" /> <citation host="suma-ev.de" id="VRAHIA" count="1" /> <citation host="www.kit.edu" id="EMaLDQ" count="1" /> <citation host="yacy.net" id="Fh1hyQ" count="1" /> <citation host="www.fzk.de" id="V2Kl-A" count="1" /> <citation host="en.wikipedia.org" id="rwtdfR" count="3" /> <citation host="vimeo.com" id="MmdQDY" count="3" /> <citation host="liebel.fzk.de" id="sX4ozA" count="6" /> </domain> </domains> </webstructure> git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5766 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b6c2167143	- patch for bad web structure dumps - added automatic slow down of accessed to specific domains when access to a web page fails git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5765 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0139988c04	- added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up. - added clean-up of unfinished merges and unused idx/gap files - enhanced merge file selection method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5764 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3621aa96ab	- added a memory protection for the IndexCell migration - fix for bad cell file selection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5763 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	568e8f1741	fix in unmountBLOB git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5762 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9da69d6b68	- better selection of files to be merged - fix for getChannel().close(), which works on windows but not on macs and linux git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d39a5b42ca	more care about open file handles. Now files also close on windows and can be deleted afterwards. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	029495e64d	fixed bug introduced in SVN 5756 in EcoTable.put() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5759 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	587838bd09	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d2e2420a68	- added another file selection method for index cell merge - more hacks to check that files are closed propertly and filehandles do not exist after files are closed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5757 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	96eaecda3e	- added migration class to go from index collections to the index cell data structure. - added better control over file deletion, because this sometimes fails, especially on windows git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0f0b4aec75	better index cell merge logic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5754 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	832fef670f	migration of urls-files into subdirectory METADATA git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5753 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fa07234d4e	fix for clear method: now deletes files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5752 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lulabad	df87e4dbf6	missing count of send Index and URLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5747 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	c450e3746b	svn attributes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5736 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	37f892b988	added new concurrent merger class for IndexCell RWI data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5735 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	8c494afcfe	svn attributes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5734 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	67aaffc0a2	- added Latency control to the crawler: because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases). The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time. - added API to monitor the latency times of the crawler: a new api at /api/latency_p.xml returns the current response times of domains, the time when the domain was accessed by the crawler the last time and many more attributes. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5733 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0926310461	another performance hack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5731 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ebe5d69d14	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5730 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	61f9dbf0cc	- fixed a display problem in watch crawler - another small enhancement in balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5729 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b3f75e48fa	- enhanced balancer: auto-solving of waiting-deadlocks - removed deprecated cache-init size value - more debug lines for IndexCell cache dump merge git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9a90ea05e0	added a merge operation for IndexCell data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5727 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d99ff745aa	fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5726 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0c3ab291c4	fix for http://forum.yacy-websuche.de/viewtopic.php?p=13354#p13354 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5725 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	a9cea419ef	Integration of the new index data structure IndexCell This is the start of a testing phase for IndexCell data structure which will replace the collections and caching strategy. IndexCall creation and maintenance is fast, has no caching overhead, very low IO load and is the basis for the next data structure, index segments. IndexCell files are stored at DATA/<network>/TEXT/RICELL With this commit still the old data structures are used, until a flag in yacy.conf is set. To switch to the new data structure, set useCell = true in yacy.conf. Then you will have no access any more to TEXT/RICACHE and TEXT/RICOLLECTION This code is still bleeding-edge development. Please do not use the new data structure for production now. Future versions may have changed data types, or other storage locations. The next main release will have a migration feature for old data structures. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5724 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	fd0976c0a7	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5723 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	83792d9233	more refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	ce79239322	"typo" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5721 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	cdbdc731c5	small updates: unescape, isCGI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5720 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	474aac65af	more refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5719 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	209f25f5f5	refactoring to integrate indexCell data structures git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5718 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	359a238acf	faster isCGI() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5717 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	f75628e53b	some corrections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5716 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b7138e5fcb	even more efficient comparator calls (less System.arraycopy for primary keys) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5715 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	65784eb656	- more efficient comparator calls - fix for http://forum.yacy-websuche.de/viewtopic.php?p=13331#p13331 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5714 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	44874cb550	added a deleteOnExit for blob file deletion in case that a deletion is not successful. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5713 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	66f78d67e0	bad idea. Concurrency in index management will be done differently git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5712 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7dff1cba62	removed option to use different primary keys in kelondro tables this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5711 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7f67238f8b	refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	14a1c33823	refactoring of wordIndex class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d49238a637	more performance hacks: better default values for scaling, less memory usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5708 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	39644dc14e	performance hacks to compare methods in database core git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5707 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e2e7949feb	replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f6d989aa04	added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy. The speed of the kelondro indexing class ObjectIndexCache can be compared with Javas standard TreeMap with the main method in IntegerHandleIndex. The result is, that the kelondro indexing needs only 1/5 of the memory that TreeMap uses! In exchange, the kelondro classes are slower than TreeMap, about four (!) times slower. However, this is not so bad because the better use of the memory is a strong advantage and makes it possible that YaCy can maintain such a large number of document (> 50 million) in one peer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5705 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	0a2fabeef3	static TMPDIR git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5704 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	9f7e62e900	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5703 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	f35dc11dc4	allow crawl start from pages with script tags http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1910 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5702 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6958eff196	removed unnecessary exceptions, extended testing in IntegerHandleIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5701 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	13c666adef	performance hack to ObjectIndex put() method: Java standard classes provide a Map Interface, that has a put() method that returns the object that was replaced by the object that was the argument of the put call. The kelondro ObjectIndex defined a put method in the same way, that means it also returned the previous value of the Entry object before the put call. However, this value was not used by the calling code in the most cases. Omitting a return of the previous value would cause some performance benefit. This change implements a put method that does not return the previous value to reflect the common use. Omitting the return of previous values will cause some benefit in performance. The functionality to get the previous value is still maintained, and provided with a new 'replace' method. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5700 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1f1be1518c	added stub for another performance hack: concurrent indexes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5699 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3e4c28e188	enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5698 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	84e37387a2	fix for last commit and more testing stub git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5697 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ca006c506d	stub for performance enhancements for RowSet (no functional change yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5696 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d988204875	better shutdown of tools git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5695 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	100247bdda	added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following: java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -export DATA/INDEX/freeworld/TEXT xml urls.xml diffurlcol.dump java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -delete DATA/INDEX/freeworld/TEXT diffurlcol.dump The export-feature is optional, the purpose of that function is to provide a back-up function for URLs to be deleted. The export function can also be used to create html files with embedded links and simple text-files. Simply replace the 'xml' word with 'html' or 'text'. The last argument in the cann, the diffurlcol.dump value, can also be omitted. This will cause that the complete URL database is exported. This is an alternative to the Web-Interface based export function. The delete-feature is the only destructive method of the four presented here. Please use it with care. It is better to make a back-up of the url database files before starting the deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5694 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
hermens	8c60d6d117	In DHT selection delete only those references that were actually selected git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5693 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	60078cf322	added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS to use this, you must user the -incollection command before (see SVN 5687) and you need a used.dump file that has been produced with that process. Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump or use different names for the dump files or more memory. As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections. The file has the format {hash-12}* that means: 12 byte long hashes are listed without any separation. The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b1ddc4a83f	do not merge collections if ram == false git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5691 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dbdd10da84	better logging and startup behaviour for referenceHash computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5690 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d64836c34f	added statistical analysis of URL reference use that with the following command on a linux shell: java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump for freeworld indexes. For more details please see discussion below: http://forum.yacy-websuche.de/viewtopic.php?p=13204#p13204 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5687 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3b28daab40	code-beautification (to be consistent with external documentation paper) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5686 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	485c9406e5	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1915&hilit=&p=13249#p13249 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5684 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	858f800a07	more logging in httpd to detect shutdown cause. See also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1914&hilit=&p=13246#p13246 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5683 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b80db04667	- refactoring of IntegerHandleIndex and LongHandleIndex (better method names) - fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled - more memory for DHT selection required - stub for URL reference hash statistics in index collections git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	8ee946bf1d	show upnp status git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5679 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	16f5c6a85e	fixed merge method initialization in ReferenceContainer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5676 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d7a493b4f5	added experimental timeline api git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5672 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	efcd95dc37	simplification of (internal) query process / refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f1b712c29a	small corrections to image loading methods in result presentation especially loading of favicons in search results. This is a fix that affects only searches in intranet/repository configurations. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5670 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d4b56d5819	added more asserts to BLOBHeap.flushBuffer() to fix the problem described in http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1679&hilit=&p=13109#p13109 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5666 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	c545fcb9fa	* add class to handle keys and signatures * fix bug in serverCharBuffer * add build-target to sign tar.gz (run ant dist sign) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	aa44d9bad9	more refactoring of kelondro.text / deleted de.anomic.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6ffc6e3389	more refactoring of indexer and kelondro classes; - integrating the indexer into kelondro as package 'text' - renaming of classes in kelondro.index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	404bc21da9	simplification of (internal) query process / refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5662 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	76ef5f0f14	refactoring of index package: better names for the classes (to be continued) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	2df57b1fd1	refactoring of index collection class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	39a177649b	* added upnp listener for devices that do not respond to discovery but advertise themselves * moved package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d1d9fbae5c	enabling the URLAnalysis to operate on multime input files, just use a wild card when calling the class from the command line git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5658 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c728879ab8	fixes to yacyURL - more exceptions in case that urls are strange git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5657 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7542336ae5	performance enhancement to yacyURL: omit second processing of resolveBackpath. This method is already applied during initialization of the object and was called a second time when the url was exportet. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5656 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7ea53fe47b	added another url list transformation option: - check the list and kick out entries with lines that contain not valid urls - normalize the urls - remove doubles - sort the list - split the list in smaller chunks This is all done in one process which can be called with a new -sort option git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5655 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e521e81148	bugfix in yacyURL (for latest performance hack) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5654 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	54625360f7	performance update git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5653 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d884c4718a	added gzip support for URLAnalysis: url lists can also be compressed with gzip If such a file is handed over to URLAnalysis, the output will also be written as .gz-file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5652 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	46632f4385	performance update to yacyURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5651 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	cf9b74e6e3	added another method to process url lists: extract hosts only This can be used like java -Xmx2000m -cp classes de.anomic.data.URLAnalysis -host DATA/EXPORT/20090224213823.txt changed als the call method to generate statistics, please use now java -Xmx2000m -cp classes de.anomic.data.URLAnalysis -stat DATA/EXPORT/20090224213823.txt git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5650 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	89d8e824ed	memory protection for URLAnalysis git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5649 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0f6fa804ff	performance update to URLAnalysis git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5648 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	8444357291	added new row interator in kelondro tables files that enumerates rows without an order by the primary key. The result is a very fast enumeration of the Eco table data structure. Other table data types are not affected. The new enumerator is used for the url export function that can be accessed from the online interface (Index Administration -> URL References -> Export). This export should now be much faster, if all url database files are from type Eco The new enumeration is also used at other functions in YaCy, i.e. the initialization of the crawl balancer and the initialization of YaCy News. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5647 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e8f5f2f612	added tool to analyse url strings and to generate statistics about words occurring in urls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5646 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago

... 2 3 4 5 6 ...

3723 Commits (160031758d957cd3040a859c9bb1e345bbe17b5b)