yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	bc2368e907	fix for problem with remote crawl referrers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4210 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	6eaa5a0e64	enhanced local search speed. The ranking process is now 6 times faster that before. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	ccbfb15b6b	enhancement to crawl stacker enqueue order git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4192 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	55c87b3b12	changed behavior of crawl stacker - final flush only when tabletype = RAM - prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100 - number of maximun entries in stacker is configurable in yacy.init (stacker.slots) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a31b9097a4	preparations for mass remote crawls: two main changes must be implemented to enable mass remote crawls: - shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused as crawl agent for unwanted file retrieval - implement new index files that control double-check of remotely crawled urls After removal of robots.txt checking from stacker threads, the multi-threading of this process is void. Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since creation of these threads is not resource-consuming, for a detailed explanation see svn 4106 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	7d57b80598	distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4176 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	98abe0804d	another enhancement to crawl starts with link files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4123 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1b42152a76	fixed and enhanced some details in crawl start with file git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4120 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	01e0669264	re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	5b1a937ed8	fix for crawl stack database format change, introduced in SVN 4113 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4115 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	842308ea97	- redesigned crawl start menu, integrated monitoring pages - removed web structure picture from indexing menu and grouped it together with htcache monitor - added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database - extended crawl profile edit servlet, shows now also terminated crawls - option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues! - fixed here and there problems with indexing queues - enhances indexing speed by changing cache flush sizes. - changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched. next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	2f1ff048ba	some fixes to socket connection time-out git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4111 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	daf0f74361	joined anomic.net.URL, plasmaURL and url hash computation: search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9ca46a8c69	indexing of local (intranet) urls enabled To do this, one must create a separate YaCy network that has a local URL domain A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	40b0547611	- documentaton changes (removed old forum links) - different handling of link quotation - different handling of link normalization - enhanced html/unicode en/de-coding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1782ef57e5	- added SSI parser and include directive for <!--# include virtual="<file>" --> - added chunked file transfer for non-yacy clients - SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished - added client-side network unit identification - cleaned up code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	069562a14d	fixed problem with re-crawl; replaced error file-db with ram-db git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3900 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c7a614830a	several bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3899 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	40c14a4f0e	- better implementation of search query properties - basic protection against start-up problems when database files are corrupted - auto-delete of not-critical databases during startup when load error occurs - on-the-fly reset option for all database tables - automatic on-the-fly reset for seed tables during enumeration exceptions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	861f41e67e	redesigned NURL-handling: - the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks - the new NURL-index is managed by the crawl balancer - the crawl balancer does not need an internal index any more, it is replaced by the NURL-index - the NURL.Entry was generalized and is now a new class plasmaCrawlEntry - the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future - the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names) - the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information - the EURL index is now filled with ZURL objects - a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers - redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another - found and fixed numerous bugs in the context of crawl state handling - fixed a serious bug in kelondroCache which caused that entries could not be removed - fixed some bugs in online interface and adopted monitor output to new entry objects - adopted yacy protocol to handle new delegatedURL entries all old crawl queues will disappear after this update! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
allo	f04097c3dd	integrated tor-patch for crawling, if yacyDebugMode is set. (replaces: http://yacy.deruwe.de/overlay/net-misc/yacy-tor/files/disable_dns_checks-svn3132.patch) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3470 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	432d7d4e9c	better catch git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3468 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8f7e8b6ee2	auto-delete for not-fixable db error in crawl stacker. see also http://www.yacy-forum.de/viewtopic.php?p=32906#32906 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3467 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	d755a8026d	- better OOM protection - better memory allocation for FlexTable indexes - splitting between static index and dynamic index (only the dynamic part must grow) - to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes - added new iterator classes that support cloneable iterators - adopted all iterator classes to implement cloneable itarators git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1cba31de43	redesigned ram organization for database caches - each cache can now allocate as much memory as is available - no more fixed limits - replaced old performance memory monitor by new one - added supervision methods as static functions into the classes that provide cache functionality - steering of ram allocation is done with two simple limits that are ram availability-relative git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	f7803a6ce4	enhanced crawl balancer - new domains now get a chance to get crawled early - less IO operations - new balancing method - better dump order at shutdown time - bugfixes regarding not found url hashes (no more superfluous cache kill) - domain access time is now shared over all balancer stacks - viewing the stack does no more disturbish the balancing algorithm that much - intelligent selection of best next domain using domain access times - extra double-check (to double-check the double-check) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3384 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	937ccd4e76	fix for snippet-generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3060 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	773ba1e91a	- generalized object order handling - controlled object order for all database tables - migrated DHT position computation to correct base64-decoded values this also closed the 'gaps' in the dht positions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3049 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	109ed0a0bb	- cleaned up code; removed methods to write the old data structures - added an assortment importer. the old database structures can be imported with java -classpath classes yacy -migrateassortments - modified wordmigration. The indexes from WORDS are now imported to the collection database. The call is java -classpath classes yacy -migratewords (as it was) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	7dbcd358b4	fix for http://www.yacy-forum.de/viewtopic.php?p=28231#28231 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3021 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	30888e7a2f	implementation of search constraints Such constraints may formulate specific restrictions to web searches This is implemented by scraping information for constraints from a web page during parsing, and storing flags to the pages within the web index. In this first step, only information for index pages ("index of", directory listings) are scraped and stored in flags - added new flag class kelondroBitfield - added scraper method in condenser - added bitfield structure for all scrape types (see also condenser) - added bitfield structure for appearance locations (see RWIEntry) - added handover protocol for remote search and index distribution - extended kelondroColumn class to hold bitfield types - added another search attribute on search page (index.html) - extended search-filter to enable filtering of non-matching constraints - set all new database types to be default - refactoring: moved word hash generation to condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	497428c8ec	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	76fceb9997	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb7d4b5d5e	refactoring to prepare new RWI entry object - moved all url and index(RWI) entries to index package - better naming to distinguish RWI entries and URL entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	1751a799ac	- deactivated all write buffers - fixed a storage bug git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ba967c4875	- bugfixes and debug code - ne generalized index class indexCachedRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2930 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	ad248d61ca	*) more verbose exception git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2901 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	215c4e65f1	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	147d88cf23	re-design of database caching this should reduce IO a lot, because write caches are now actived for all databases - added new caching class that combines a read- and write-cache. - removed old read and write cache classes - removed superfluous RAM index (can be replaced by kelonodroRowSet) - addoped all current classes that used the old caching methods - more asserts, more bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2a9d868f6d	- removed object cache from kelondroTree - generalized object caching and added new object caching class - added object caching wherever kelondroTree was used - added object caching also to usage of kelondroFlex - added object buffering (a write cache) to NURLs - added many assert statements; fixed bugs here and there - added missing close methods to latest added classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	b5e40e2fa2	- fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a5dd0d41af	- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6396f5971e	bugfixes and migration attempt toward new kelondroFlex db - more synchronization - bugfix for remove in collections - bugfix in kelondroFlex (wrong exception condition!) - options to use RAM, FLEX and TREE tables for Crawl URL stacker - default for Crawl URL stacker is now FLEX (!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	310f1c41cd	added option to see ranking scores in surftipps and some cleanups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	26dfbb7499	*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	dae763d8e3	git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	e3f0136606	*) next step of restructuring for new crawlers - adding function isSupportedProcotol to plasmaCrawlLoader.java - disabling robots.txt check for protocols other than http(s) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1c8300fcec	*) Bugfix for name resolution in proxy mode See: http://www.yacy-forum.de/viewtopic.php?p=25241 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2 3

120 Commits (3491531cea463175b5edacd7917b172a0c63478d)