yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	ae015e8e98	refactoring of blob package classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6088 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ce1adf9955	serialized all logging using concurrency: high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b8e738a7be	a collection of - small bug fixes - better/more comments - more asserts - fixed synchronization - test case enhancements - code cleanup - performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6073 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d58b395993	fix for http://forum.yacy-websuche.de/viewtopic.php?p=15693#p15693 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6049 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b6e274f211	omit most of forced crawl delays by using a separat delay table which flushes delayed URLs at the correct time git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6029 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d50be59088	- added a automatic re-construction of the domain stack after 10 minutes. this includes then urls to the domain stack that were left over in case of stack size limitations when the domain stack was created the last time - changed the busy sleep time for the crawl thread to 30 millisecons. This is sufficient to crawl with 2000 PPM. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6028 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5fdba0fa51	- fixed a not working selection rule in balancer - more security about crawl-delay, be more fail-save - better logging in case of long forced crawl-delays git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6027 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f5602404d5	another speed boost for the balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6026 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	95e8cbd1c3	new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6025 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	42ae40b9f6	some bugfixes to database close() methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6023 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	88426912ad	more refactoring to make the segment object easier to use and to be prepared to integrate author navigation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5992 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	99bf0b8e41	refactoring of plasmaWordIndex: divided that class into three parts: - the peers object is now hosted by the plasmaSwitchboard - the crawler elements are now in a new class, crawler.CrawlerSwitchboard - the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3d4b826ca5	migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically. This removes the last very IO-intensive data structures which were still used for Wiki, Blog and Bookmarks. Old database files will still remain in the DATA subdirectory but can be deleted manually if no major bugs appear during migration. There is no need for any user action, all migration is done automatically. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5986 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	63a0255166	- refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index - refactoring: migrated data objects for the new connector classes - added a DAO interface class to specify an abstract interface for database retrieval connector methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5977 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	addecdb18c	simplified code, removed one unused method in all implementing classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5972 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	734680dc70	initialize the ResourceObsever in own thread git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5968 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d2ac0aa682	- fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling - increased default memory size to 180MB - fixed possible bug in http client reset (there was a deadlock) - bug in BOBHeap marked, but not solved, cause is still unknown. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5912 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	138422990a	- removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated - added some debugging output to balancer to find a bug - removed unused classes for index collection handling - changed some default values for the process handling: more memory needed to prevent OOM git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5856 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	635b0a9da7	code-split allow cgi indexing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5839 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fa3adbbfc6	added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5837 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	ab0030d7a7	allow dht-out for remote-crawl processing peers on default settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5834 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4e97a31009	corrections in dublin core syntax git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5823 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7dfe7e7cc6	fixed some problems with surrogate reader. This is now ready for testing. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5817 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9050a3c4c5	alpha version of surrogate reading and indexing. see the example file for an explanation. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5815 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ad78e3a59f	- less lines in rssTerminal - crawl more documents: if remote crawling is enabled, a remote crawl list is also loaded if a local crawl is running in case that the indexer is idle git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5809 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bc80dc913a	added new surrogate reader (surrogates are parsed documents on batches) this will open a new way to insert indexes to YaCy (instead crawling) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5808 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e58320a507	added more info in log fore debugging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5805 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c0e8ed5461	fixed problem with not http client git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5801 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c2359f20dd	refactoring: better abstraction of reference and metadata prototypes. This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index. Moved to version 0.74 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
shostakovich	1f37cc6107	Robots.txt is now reused after one day. See forum-topic: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1669&p=13565#p13565 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5772 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9bfb2641db	- removed deprecated threads - added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b6c2167143	- patch for bad web structure dumps - added automatic slow down of accessed to specific domains when access to a web page fails git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5765 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0139988c04	- added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up. - added clean-up of unfinished merges and unused idx/gap files - enhanced merge file selection method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5764 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3621aa96ab	- added a memory protection for the IndexCell migration - fix for bad cell file selection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5763 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d39a5b42ca	more care about open file handles. Now files also close on windows and can be deleted afterwards. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	029495e64d	fixed bug introduced in SVN 5756 in EcoTable.put() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5759 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	587838bd09	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	96eaecda3e	- added migration class to go from index collections to the index cell data structure. - added better control over file deletion, because this sometimes fails, especially on windows git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	37f892b988	added new concurrent merger class for IndexCell RWI data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5735 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	8c494afcfe	svn attributes added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5734 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	67aaffc0a2	- added Latency control to the crawler: because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases). The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time. - added API to monitor the latency times of the crawler: a new api at /api/latency_p.xml returns the current response times of domains, the time when the domain was accessed by the crawler the last time and many more attributes. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5733 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	61f9dbf0cc	- fixed a display problem in watch crawler - another small enhancement in balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5729 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b3f75e48fa	- enhanced balancer: auto-solving of waiting-deadlocks - removed deprecated cache-init size value - more debug lines for IndexCell cache dump merge git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d99ff745aa	fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5726 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	fd0976c0a7	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5723 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
borg-0300	ce79239322	"typo" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5721 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7dff1cba62	removed option to use different primary keys in kelondro tables this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5711 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	7f67238f8b	refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	14a1c33823	refactoring of wordIndex class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	f6d989aa04	added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy. The speed of the kelondro indexing class ObjectIndexCache can be compared with Javas standard TreeMap with the main method in IntegerHandleIndex. The result is, that the kelondro indexing needs only 1/5 of the memory that TreeMap uses! In exchange, the kelondro classes are slower than TreeMap, about four (!) times slower. However, this is not so bad because the better use of the memory is a strong advantage and makes it possible that YaCy can maintain such a large number of document (> 50 million) in one peer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5705 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago

1 2 3 4

169 Commits (041d9c253e0a98c3627836f59ed1a4f77c52c9f7)