yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	861f41e67e	redesigned NURL-handling: - the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks - the new NURL-index is managed by the crawl balancer - the crawl balancer does not need an internal index any more, it is replaced by the NURL-index - the NURL.Entry was generalized and is now a new class plasmaCrawlEntry - the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future - the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names) - the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information - the EURL index is now filled with ZURL objects - a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers - redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another - found and fixed numerous bugs in the context of crawl state handling - fixed a serious bug in kelondroCache which caused that entries could not be removed - fixed some bugs in online interface and adopted monitor output to new entry objects - adopted yacy protocol to handle new delegatedURL entries all old crawl queues will disappear after this update! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	581db87237	more debug code for http://www.yacy-forum.de/viewtopic.php?p=33009#33009 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3479 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	81c4cc6bf7	better debugging of balancer failure git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3478 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6faa262259	fix for NURL-fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3465 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	243a2f831b	fixed problem with not found NURL-hashes The cause for this problem could still not be found, but the effect is handled much better. The NURL-pop will continue automatically until it found a hash that can be found. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3458 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6ad39bae1e	fixed shutdown problem this fixes the 'inconsistency' messages during start-up git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3457 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	d755a8026d	- better OOM protection - better memory allocation for FlexTable indexes - splitting between static index and dynamic index (only the dynamic part must grow) - to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes - added new iterator classes that support cloneable iterators - adopted all iterator classes to implement cloneable itarators git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4e8eb1dbe3	some minor changes here and there git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3441 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1cba31de43	redesigned ram organization for database caches - each cache can now allocate as much memory as is available - no more fixed limits - replaced old performance memory monitor by new one - added supervision methods as static functions into the classes that provide cache functionality - steering of ram allocation is done with two simple limits that are ram availability-relative git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f7803a6ce4	enhanced crawl balancer - new domains now get a chance to get crawled early - less IO operations - new balancing method - better dump order at shutdown time - bugfixes regarding not found url hashes (no more superfluous cache kill) - domain access time is now shared over all balancer stacks - viewing the stack does no more disturbish the balancing algorithm that much - intelligent selection of best next domain using domain access times - extra double-check (to double-check the double-check) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3384 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	dc0c06e43d	PLEASE MAKE A BACK-UP OF YOUR COMPLETE DATA DIRECTORY BEFORE USING THIS redesign for better IO performance enhanced database seek-time by avoiding write operations at distant positions of a database file. until now, a USEDC counter was written at the head-section of a kelondroRecords database file (which is the basic data structure of all kelondro database files) to store the actual number of records that are contained in the database. Now, this value is computed from the database file size. This is either done only once at start-time, or continuously when run in asserts enabled. The counter is then updated only in RAM, and written at close of the file. If the close fails, the correct number can be computed from the file size, and if this is not equal to the stored number it is a strong evidence that YaCY was not shut down properly. To preserve consistency, the complete storage-routine had to be re-written. Another change enhances read of nodes in some cases, where the data-tail can be read together with the data-head. This saves another IO lookup during each DB node fetch. Includes also many small bugfixes. IF ANYTHING GOES WRONG, ALL YOUR DATA IS LOST: PLEASE MAKE A BACK-UP git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3375 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8c1d2e0227	protection against crawl balancer failure: a minimum of 500 milliseconds distance between two acesses to the same domain is now ensured git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3354 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	773ba1e91a	- generalized object order handling - controlled object order for all database tables - migrated DHT position computation to correct base64-decoded values this also closed the 'gaps' in the dht positions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3049 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	052f28312a	removed assortments from indexing data structures removed options to switch on assortments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3041 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	0b9370a9dc	fix for http://www.yacy-forum.de/viewtopic.php?p=28108#28108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3013 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	30888e7a2f	implementation of search constraints Such constraints may formulate specific restrictions to web searches This is implemented by scraping information for constraints from a web page during parsing, and storing flags to the pages within the web index. In this first step, only information for index pages ("index of", directory listings) are scraped and stored in flags - added new flag class kelondroBitfield - added scraper method in condenser - added bitfield structure for all scrape types (see also condenser) - added bitfield structure for appearance locations (see RWIEntry) - added handover protocol for remote search and index distribution - extended kelondroColumn class to hold bitfield types - added another search attribute on search page (index.html) - extended search-filter to enable filtering of non-matching constraints - set all new database types to be default - refactoring: moved word hash generation to condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	497428c8ec	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	76fceb9997	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb7d4b5d5e	refactoring to prepare new RWI entry object - moved all url and index(RWI) entries to index package - better naming to distinguish RWI entries and URL entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1751a799ac	- deactivated all write buffers - fixed a storage bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	147d88cf23	re-design of database caching this should reduce IO a lot, because write caches are now actived for all databases - added new caching class that combines a read- and write-cache. - removed old read and write cache classes - removed superfluous RAM index (can be replaced by kelonodroRowSet) - addoped all current classes that used the old caching methods - more asserts, more bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2a9d868f6d	- removed object cache from kelondroTree - generalized object caching and added new object caching class - added object caching wherever kelondroTree was used - added object caching also to usage of kelondroFlex - added object buffering (a write cache) to NURLs - added many assert statements; fixed bugs here and there - added missing close methods to latest added classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	06854988da	- full integration of new LURL database in INDEX - added migration method for urlHash.db into INDEX git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	ebf0da2a45	- now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	26dfbb7499	*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b7f4a1521b	added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c26da4893b	turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	db1eae0227	* simplified initialization of database objects * replaced kelondroTree for NURLs by kelondroFlex * replaced kelondroTree for EURLs by kelondroFlex take care, may be very buggy please finish crawls before updating. crawls will be lost. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6ad471ef96	* applied many compiler warning recommendations * cleaned up code * added unit test code * migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cd5f7e137c	fixed problem with NURL-generation upon first startup (a new kelondroFlexTable was generated, which should not) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9ae9062bd3	* disabled new kelondroFlex table for NURLs * added new RAM index Class * fixed possible synchronization problem in kelondroRecords git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2388 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	689bbcf9cd	replaced kelondroTree db for NURLs by new kelondroFlexTable The new database is only created if the old is deleted or does not exist git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	130e6d4719	generalized index object for eurl, nurl and lurl to prepare move of these tables to new kelondroFlexTable Object git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	740d49751d	* strict type and size check in kelondroRow handling * adopted all code to use the declaration form of kelondroRow * fixed a bug in kelondroRow which caused wrong parsing of encoding type * the bug caused bad database behaviour in new indexCollection data structure. because of this bug, all test databases are now already void. A new database is created * the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition into a properties file along the database files. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ad692fc6c7	implemented option to extract nurls from the database (plus some iteration enhancements for nurls) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7fd90ca7c8	* strict handling of NURL entry element generation, storage and stacking * more space for EURL reason strings (you must delete the EURL db to use this) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5f72be2a95	some redesign of EURL storage * store() is now called explicitely * more urls are written to the EURL table * the EURL stack does not store the complete entry any more, now only the URL hash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5214f571cd	simplified method call in balancer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2303 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	92f4cb4d73	added option to configure the start-up delay time for kelondro database files. the start-up delay is used to pre-load the database node cache git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c36e9fc8d3	full integration of kelondroRow git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2167 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	4a907a570f	1st step to migrate kelondroTree to usage of kelondroRow instead of byte[][] git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2162 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3c3c047d0a	integrated kelondroRow into kelondroStack git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2156 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	81e79f2caf	fixed new cache behaviour changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2134 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	cda087f43b	- integrated cache miss storage into object cache - removed cache-miss handling from indexURL todo: new Monitoring in PerformanceMemory_p git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2132 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	90d569d70f	refactoring of index management: url storage is part of index management; moved plasmaURL to indexURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	df7e1d9df3	Changes to plasmaURL and subclasses: - Improve performance of plasmaURL.exists() by remembering URL-hashes that are not present - Use a more realistic estimation of memory usage by the existsIndex cache - Routine cleanup of the existsIndex to limit its memory usage git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2113 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3e31820c3d	- corrections to PerformanceMemory display of object cache - configuration of object cache size in kelondroTree initializer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2075 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2

87 Commits (dfd5e823c3fa8ea7d693ae881a9efe18409002a5)