yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	bf0d820659	- added correct flagging of word properties - added self-healing to database in case that wrong free-pointers exist - added presentation of media links in snippets (does not yet work correctly) - code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3055 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9a85f5abc3	cleanup - removed 'deleteComplete' flag; this was used especially for WORDS indexes - shifted methods from plasmaSwitchboard to plasmaWordIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3051 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	109ed0a0bb	- cleaned up code; removed methods to write the old data structures - added an assortment importer. the old database structures can be imported with java -classpath classes yacy -migrateassortments - modified wordmigration. The indexes from WORDS are now imported to the collection database. The call is java -classpath classes yacy -migratewords (as it was) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	052f28312a	removed assortments from indexing data structures removed options to switch on assortments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3041 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4ce590622f	- more asserts - better memory usage during remove in kelondroRowSet git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3022 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	0a0c3edeb6	fixed a bug in index transfer - the encoding within the new entry format for binary data was wrong - the string parser of RWI receive had to be enhanced added some mor debugging tools - a target peer for index transfer can now be selected by typing in the peer name - the RWI result list has an entry counter enhanced routing - if communication is between two peers that have the same IP address, the loopback address 127.0.0.1 is used instead the public IP to contact the peer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3003 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	30888e7a2f	implementation of search constraints Such constraints may formulate specific restrictions to web searches This is implemented by scraping information for constraints from a web page during parsing, and storing flags to the pages within the web index. In this first step, only information for index pages ("index of", directory listings) are scraped and stored in flags - added new flag class kelondroBitfield - added scraper method in condenser - added bitfield structure for all scrape types (see also condenser) - added bitfield structure for appearance locations (see RWIEntry) - added handover protocol for remote search and index distribution - extended kelondroColumn class to hold bitfield types - added another search attribute on search page (index.html) - extended search-filter to enable filtering of non-matching constraints - set all new database types to be default - refactoring: moved word hash generation to condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e3d75f42bd	final version of collection entry type definition - the test phase of the new collection data structure is finished - test data that had been generated is void. There will be no migration - the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION - the index dump is void. There will be no migration - the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	497428c8ec	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb7d4b5d5e	refactoring to prepare new RWI entry object - moved all url and index(RWI) entries to index package - better naming to distinguish RWI entries and URL entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b79e06615d	- added new LURL.Entry class for next database migration - refactoring of affected classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a5dd0d41af	- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2bb529cedb	added peer tags for peers in robinson mode git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2741 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hydrox	85f3617835	*) moved HTML from class-file to template-file (please check if it is valid HTML) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2644 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	e745b63c77	*) Bugfix for different behavior of indexDistributeWhileCrawling to other checkboxes on IndexControl_p.html See: http://www.yacy-forum.de/viewtopic.php?t=2849 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2637 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	74d1dea30b	changes towards better join-search - added generation of a compressed index within remote peers during global search - added selection of specific urls within remote peers during secondary global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c543028dd4	fixed double/missing null check for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2520 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	9340dbb501	fixed all possible problems with nullpointer exception for LURLs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
hermens	ff4362b02d	some more fixes for new plasmaCrawlLURL.load behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	eee44be602	*) adding an interface for customized blacklist classes - now it's possible to use a customized blacklist engine instead of the default one - this can be done by configuring the property BlackLists.class See: http://www.yacy-forum.de/viewtopic.php?t=2108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d2e8e76218	*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler See: http://www.yacy-forum.de/viewtopic.php?t=2541 http://www.yacy-forum.de/viewtopic.php?p=24516 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	abf22f6e60	removed url normalform computation from htmlFilterContentScraper. This method was implemented in de.anomic.net.URL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c4e922885a	replaced indexURLEntry by new class that uses a kelondroRow.Entry object to store the index entry. This is another step to move to the new database structure. A side effect of this change is, that index storage uses much less RAM space, which affects the index RAM cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	e357599f92	* fixed problem with indexContainer iteration from RAM: indexContainers from RAM must be cloned explicitely to prevent side-effects on stored indexContainer objects in Cache * changed behaviour of urlReference deletion from indexContainers: deletion does not user retrieval of all Elements from the assortments * added textual configuration of kelondroRow and kelondroColumn definition * update of kelondroRow usage in yacyNews * modified kelondroAttrSeq to use modified kelondroColumn parser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	417ed5102e	redesign of database iterators: an iteration of key elements in kelondroTree databases is no longer supported. this is now replaced by an iteration of kelondroRow.Entry objects from the database Iteration of keys from the database was mostly followed by retrieval of the row from the database, whcih caused unnecessary database load. The index selection was also redesigned to use the new row iteration methods. This affects many funktions, most important is the DHT selection routine which is now much faster. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ad692fc6c7	implemented option to extract nurls from the database (plus some iteration enhancements for nurls) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	58df8b7bbf	a large collection of different changes * mainly for the transition to the new indexing database structure * a bugfix for an endless loop inside kelondroTree iteration * a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice * very strong speed enhancement for url/domain extraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4ca0857c0c	*) Index transfer now considers the pause time send by busy peers during index transfer / index distribution See: http://www.yacy-forum.de/viewtopic.php?p=22647#22491 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2205 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5041d330ce	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7b3b12888c	refactoring: integrated indexContainer abstraction layer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2149 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	196b8abb30	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2144 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	90d569d70f	refactoring of index management: url storage is part of index management; moved plasmaURL to indexURL git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a930be4ba3	refactoring of index management: generalized the index entry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a474669338	start with refactoring of index management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1f4412a146	adopted isListed to discussed new behavior as discussed (url, getFile) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	6c70f4a0cf	renamed wordHashes for a word hash set generation to wordHashSet This was done because the wordHashes iterator will get another integer parameter and then conflicts with the wordHashes set generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1921 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	dba02f399f	starting of re-design of kelondroTree iterator - new access to iterator - added many IOException handling in other Classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1914 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a6a3f4b694	fix for svn 1888 this is a redesign of the no-iterator solution git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1892 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a840755964	moved parts of index transfer logic back to switchboard this is needed to merge the dht selection with the indexing thread git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1718 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7eb10675b3	re-organization of index management this was done to be prepared for new storage algorithms git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1e4578aab6	VERY EXPERIMENTAL removal of index ram cache flushing thread. The cache will fill up and flushed explicitely when it is full. This shall remove double-access of assortments (indexing and flush) during indexing process. Hopefully this should reduce IO. The main idea is: the cache shall mainly be flushed by DHT transfer, and only indexes that shall be hosted by the own peer are flushed to the assortments. This needs further work. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1617 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	fa90c3ca7a	- removed some usage of indexEntity - changed index collection process: indexes are not first flushed to indexEntity, but now collected directly from ram cache and assortments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	03c65742ba	changes towards the new index storage scheme: - replaced usage of temporary IndexEntity by EntryContainer - added more attributes to word index - added exact-string search (using quotes in query) - disabled writing into WORDS during search; EntryContainers are used instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	5756b9d5c8	blacklist entries now checked git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1426 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	0fcc07d3f7	"unresolved URL Hash" entries now checked git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1404 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	f4ffa9aee5	- implemented more attributes to index entries - implemented hand-over of new word index attributes during remote search - implemented word-distance computation during search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	0371494010	tried to add word position to index git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1377 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2

79 Commits (7ff86d6ba6cb79b2ab0327e972cb123180d30435)