yacy_search_server

Commit Graph

Author	SHA1	Message	Date
hermens	4b83875abd	Small fixes for the heapCacheIterator in ReferenceContainerCache: - Start the iteration at startWordHash - When used with rotation, let the iteration stop when the cache is empty git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6293 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	af3a696fc4	added a fast-fail concept in search processes. The search now has better control if all the remote searches may bring any result. If all processes are finished, then all search tasks fail fast. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6290 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	ce972ff4ef	update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6288 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	3b9aaf9e9f	- inserted new library tests inside DidYouMean - some redesign of DidYouMean that was necessary to follow a special rule how a library should be used: - the library provides words that start or end with a test word which may be possibly also an empty set of words - all words that the DidYouMean produced with the four production rules are used to generate a set of library-completed words - if this process results in any words from the library, only library-genrated words are taken - if the is no library-generated word at all, take the artifial generated word - all words that result from these rules are tested against the index - the result is ordered using a lightweight comparator that prefers short words - a not-so-much-io test against the index is beeing prepared next - insered the library initialization into the switchboard git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6284 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	39a311d608	better care to do not loose the merge/dump thread git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6278 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	10d3e856b5	better concurrency, less blocking & performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6277 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1a9cfd8718	some performance hacks (CPU only, not IO) this will cause better computation speed for single- and multi-core; there are enhancements that will speed up old and slow machines as well as multi-core CPUs. Indexing of surrogates has been speed up from 4000 PPM to over 20000 PPM on a simple dual core office computer. Since the enhancements are mostly in core routines, the hack should also speed up search performance. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6276 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	92407009b2	cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6275 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	0ba1beaf56	separated rwi constraint evaluation from rwi ranking and added concurrency git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6274 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	ce7924d712	better concurrency for rwi entry parsing during search processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6273 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	0e471ba33b	- fixed a bug in fast digest computation - added a open-on-demand hack to heap files: when a heap file is opened the first time, it is first scanned to get a key index and then it is closed again. This will free up file pointers in cases where a really large number of blob files are opened upon initialization of ArrayStack objects. This should solve also a problem reported in http://forum.yacy-websuche.de/viewtopic.php?p=17191#p17191 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6267 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
hermens	c4d0e22a77	Further speed upof concurrent DHT-receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6259 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
hermens	2fbc0696bf	Fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2334 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6258 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	8e56c2ace6	fix for fixes from this afternoon git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6253 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	6354b5e447	removed possible deadlock, see http://forum.yacy-websuche.de/viewtopic.php?p=17017#p17017 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6251 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	5cc17ccf8a	a better caching with less overhead and more appropriate synchronisation use in more than 10 different data objects git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6250 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0575f12838	fix for deadlock git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6246 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	fbfdaf063d	- patch to omit IndexOutOfBoundsException when a b64-encoded key appears not to be well-formed. In that case the key is still accepted but rated higher than other regular keys to create a virtual ordering between well-formed and ill-formed keys - check routine at the beginning of the import of table keys that check that all imported keys are well-formed. All records that have a ill-formed key are deleted. This is a hack and is not tested since I don't have bad data here to test with. If the effect is seen in the wild, please report in the forum. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6245 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c4ae2cd03f	fixed bug that caused deletion of crawl profiles at every application startup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6240 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	161d2fd2ef	redesign of access to the HTCache (now http.client.Cache): - better control to the cache by using combined request-header and content access methods - refactoring of many classes to comply to this new access method - make shure that the cache is always written if something was loaded - some redesign of the process how http response results are feeded into the new indexing queue - introduction of a cache read policy: * never use the cache * use the cache if entry exist * use the cache if the proxy freshness rule confirmes * use only the cache and go never online - added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step. - set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule' - enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	51534df0cb	fix for possible synchronization problem see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2292&hilit=&p=16787#p16787 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6234 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1d8d51075c	refactoring: - removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here: http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages. - cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http. - because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5bb8074150	removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency. - The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well. - Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified. - Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed. - The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here. - Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	b332dfad67	- inserted request object into response object which carries this now instead generating new objects - fixed a problem with the crawler introduced in SVN 6216 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6222 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ca72ed7526	-removed superfluous crawl cache -refactoring of crawler classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6221 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3f113f38a8	removed unused imports removed unused libs from eclipse class path git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6201 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	f814e0fa81	enable warnings and fix most of it git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6196 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	21b8704fb4	refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6188 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0e8647d62f	refactoring of search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6184 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dafffd0153	refactoring of parsers and document processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6182 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	77d2a3782c	removed strange debugging strings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6177 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	16efcd0366	fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2252&hilit=&p=16389#p16389 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6172 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	24cb6d68bc	- renamed Stack to RecordStack to avoid name confusion with new classes - added new Stack class that implements a stack on BLOB files - added new Stacks class that can be used for a set of Stacks (a 'Stack Database') - added methods to other classes to support the new stacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6169 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	409538e17a	code cleanup and code simplifcation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6161 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	1f1399e5c5	extending visibility of objects and methods to avoid synthetic accessor methods and increase performance git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6156 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	222850414e	simplification of the code: removed unused classes, methods and variables git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6154 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	93dfb51fd4	problems with code style git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6153 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	adf01c676e	reduce lookup time when merging a large number of BLOBs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6152 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9a674d8047	- After the removal of the Tree class some code simplifications are possible. This affects mostly the Records class, which can be refactored and the result of the refactoring results in a reduced number of classes. - The EcoTable was renamed to Table. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6151 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	c5122d6836	completed migration of BLOBTree to BLOBHeaps: - removed migration code - removed BLOBTree after the removal of the BLOBTree, a lot of dead code appeared: - removed dead code that was needed for BLOBTree Some more classes may have not much use any more after the removal of BLOBTree, but still have some component that are needed elsewhere. Additional Refactoring steps are needed to clean up dependencies and then more code may appear that is unused and can be removed as well. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6150 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	6b307d6d59	more tolerance for corrupted index entries in exported row sets git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6099 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	33aafa9b4b	better logging when writing merged dumps git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6098 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	4d29e90708	uaeh git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6096 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3c3e6499ae	added more logging for merge operation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6095 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	15180fc95e	- patch for future computation in SplitTable - added same concurrent process for has() from SPlitTable in ArrayStack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6093 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	9a5ec20b3c	avoid merge during startup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6092 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ae015e8e98	refactoring of blob package classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6088 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	be1c7ddc64	refactoring of search classes -- moved Ranking Profile to search package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6086 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5a7fd6b4c8	just some comment lines git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6081 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ce1adf9955	serialized all logging using concurrency: high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago

1 2 3 4 5 ...

893 Commits (eaddf2d464926b960539b1a4ede6ddb5cfaa9d4d)