yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	11 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
orbiter	47114910d5	fix for possible memory leaks	12 years ago
Michael Peter Christen	38d3feae65	added separate delete commands for the local+remote solr index, the old metadata and old rwi and for the citation index. The important advancement is the separation of the citation index deletion because that index is responsible for the linkdepth calculation. Now a search index can be deleted without the citation index and that should cause that less clickdepths must be post-processed.	12 years ago
Michael Peter Christen	a8167e6e5b	clean-up: removed unused methods in kelondro	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Christen	e9dc99fe15	added rules to set specific RWIs as private RWIs which are not transmitted to remote peers. This will be used for private index copies and phonetic indexes.	13 years ago
orbiter	44d6416e2d	ensure termination of shrink() git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7927 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	52230a6864	replaced catching of Exception with Throwable, which catches also Errors git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7926 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	e1a3d609aa	moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	45e497a9bd	fix for term iteration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	2c595a6a47	added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	75df87832c	refactoring/better naming of methods and classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	1f300217f8	more protection for the cleanup thread git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7848 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	d13103a0a7	changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7847 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	267290a821	removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7802 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3ed4a09368	small features, some bug fixes and performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b45701d20f	this is a re-implementation of the YaCy Block Rank feature This time it works like this: - each peer provides its ranking information using the yacy/idx.json servlet - peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob - this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable - I computed new ranking tables as part of the distribition and commit it here also - the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers - a recursive block rank refinement is implemented but disabled at this point. it needs more testing Please play around with the ranking settings and see if this helped to make search results better. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3820525464	more memory protection: auto-flush of caches in case of memory shortage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7575 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	3b40b98256	) set SVN properties ) minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7567 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f8d0454c53	small bug fixes and experiments with search speed enhancement git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7549 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	387db84087	maybe found bug in non-working index dumper git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7414 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e753027c43	fix for http://forum.yacy-websuche.de/viewtopic.php?p=21439#p21439 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7390 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	bf4ef1513e	- fix for map view - remove some UNRESOLVED PATTERN - maybe a fix for non-flushing cache git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7389 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	db99db4be9	some redesign of the search-fail-response mechanism: when a search fails for a single url because the snippet cannot be generated, then the url reference is deleted from the index. This mechanism was redesign and enhanced. The process now also writes into the work tables into the table searchfl to prepare a re-indexing mechanism. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7364 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	7cd9d9d22a	- enhanced DidYouMean computation using a faster count on index entries; this causes that results can be ranked better - added limitations on DidYouMean result sets according to input and output string length git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7246 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	64860dc1bb	enhanced search event logging (to be used for further improvements) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	6388a58fc7	better memory management and slightly less (in total and temporary) RAM allocation: - confirm that database objects that are not supposed to grow do not have a index memory management that is designed for growth - changed index sorting method in such a way that it allocates less objects during quicksort - database classes classes renaming (shorter, naming addresses that objects hold in RAM) - added a large number of asserts to check if objects actually take the RAM that they should have git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7019 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b03caaa57a	better handling of OOM situations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6918 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	455a763d7c	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6845 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	90c3e5d6f6	- cleanup, removed unused imports - added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html - fixed a bug in queue enumeration that caused a out of bounds exception git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	93ea0a4789	enhanced remove operation in search consequences (which are triggered when the snippet fetch proves that the word has disappeared from the page that was stored in the index) - no direct deletion of referenced during search (shifted to time after search) - bundling of all deletions for the references of a single word into one remove operation - enhanced remove operation by caring that the collection is stored sorted (experimental) - more String -> byte[] transition for search word lists - clean up of unused code - enhanced memory allocation of RowSet Objects (will use a little bit less memory which was wasted before) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6823 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	8b8107b2a3	reduced IO-load and synchronization/blocking - enhanced the Balancer performance when building new domain stacks using a new Table buffer - added the new Table buffer BufferedObjectIndex class - changed order of access to LURL-read (prefereing segment over Crawl Queues) will reduced blocking time on balancer - fixed PPM setting in Crawler_p servlet (had doubled values) - reduced synchronization in IndexCell because it is not necessary: reduced blocking during indexing/merging/dumping - removed did-you-mean cache in IndexCell because that caused too much overhead and more memory usage but was not very useful. This reduced also deadlocks that could be causes when searched are performed during indexing. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6819 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	ed07046870	flush only when > 3000 RWIs present + code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6817 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1a8a134e0c	continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790 The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	650be3599f	added a time-out to the RWI cache to flush the cache if it has not been written for ten minutes. This additional dump criteria is necessary because some data sources repeat their vocabulary and may cause that the number of words in a RWI does not increase while the number of references in the RWI set increases. Now the RWI Buffer is flushed every 10 minutes or later if at that time already a dump is ongoing. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6811 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1e8e79b9ef	redesign of reference hash (URL-hash) parameter hand-over: pass value as byte[], not as String. This should cause that less byte[] <-> String conversions are made during time-critical tasks. This redesign is not yet complete, more to come .. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	749ffbd642	- added another catch case for the index dump and index merge process that should cause non-blocking behavior in case that index dump and/or index merge caused any unexpected exception. - reverted SVN 6766, this is too dangerous (may cause unexpected memory usage) and should not be necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6773 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	31e29a8831	- removed synchronization during index dump and index cleaning - added semaphores to synchronize index dump and index cleaning for each process separately git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6767 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	eeca2ded92	fix for http://forum.yacy-websuche.de/viewtopic.php?p=18500#p18500 - catch uncatched OOM - less wasting of memory git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6555 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	37245430c3	fix for NPE during DHT RWI selection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6527 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	362b7a929b	added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6521 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	8281e29963	- more configuration for profiling graph (number of events) - more logging for a shutdown: print reason and accessing IP into log git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago

1 2

54 Commits (deadeb406eed1425d0afdf3aef794eec13e92756)