yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	30888e7a2f	implementation of search constraints Such constraints may formulate specific restrictions to web searches This is implemented by scraping information for constraints from a web page during parsing, and storing flags to the pages within the web index. In this first step, only information for index pages ("index of", directory listings) are scraped and stored in flags - added new flag class kelondroBitfield - added scraper method in condenser - added bitfield structure for all scrape types (see also condenser) - added bitfield structure for appearance locations (see RWIEntry) - added handover protocol for remote search and index distribution - extended kelondroColumn class to hold bitfield types - added another search attribute on search page (index.html) - extended search-filter to enable filtering of non-matching constraints - set all new database types to be default - refactoring: moved word hash generation to condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e55ef0df28	- automatic migration of old RWI entries to new format during remote search if new collections are activated - one more assert in RowSet, control of removeMarker git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2993 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e3d75f42bd	final version of collection entry type definition - the test phase of the new collection data structure is finished - test data that had been generated is void. There will be no migration - the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION - the index dump is void. There will be no migration - the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	c9364246cc	introduced new RWI-Object. This will be used for the final version of the collections. The new object is not yet used. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2966 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	e628d34e16	patches for bad data git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2951 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	497428c8ec	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bb7d4b5d5e	refactoring to prepare new RWI entry object - moved all url and index(RWI) entries to index package - better naming to distinguish RWI entries and URL entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	ee4715a21c	- more asserts - bugfix for performaceMemory - refactoring of index ram cache: renamed indexRAMCacheRI to indexRAMRI, to make space for a cached indexRI, which should be named indexRAMCacheRI git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2925 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	8fdefd5c68	generalization of payload definition of index storage this is one step forward to the migration to a new collection data format git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bd4f43cd66	- fixed a null pointer exception bug - switched off more write caches - re-enabled index-abstracts search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	06854988da	- full integration of new LURL database in INDEX - added migration method for urlHash.db into INDEX git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	b79e06615d	- added new LURL.Entry class for next database migration - refactoring of affected classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	77a59a115d	refactoring of indexing methods git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	a5dd0d41af	- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format - added test migration method to migrate the old LURL to a new LURL the new LURL will be splitted into different tables for each month this solves several problems: - the biggest table in YaCy is splitted in different parts and can also be managed in filesystems that are limited to 2GB - the oldest entries can easily be identified, used for re-crawl und deleted - The complete database can be limited to a specific size (as wanted many times) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	db294687ea	enhanced logging - more logging output - fix in log line preparation - added filter to log page - some small bugfixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	bcf2b800b4	applied UTF-8 encoding parameter to yacy-internal protocol communication git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	df1629b05a	- code cleanup - version 0.471 - moved surftipps to own web page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2463e5624a	'quick' release 0.47 - documentation update - necessary bugfixes (missing css for new peers) - reduced effect of search result redundancy filter - removed some debug output, but not all git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6e2907135a	bugfixes for remote search server part git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	cf9884e22b	first attempt to implement a secondary search this is a set of search processes that shall enrich search results with specialized requests to realize a combination of search results from different peers. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	7ef80c1026	more debugging git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2566 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	75b198bc02	- updated references to indexContainer - more bugfixes and debugging for indexAbstract processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4f9e42d5ed	more changes towards better join-search - fixed problems with index-abstract generation - added analysis output for index abstract receive git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	82a6054275	- fixed bug with new indexAbstract generation - added partly evaluation of indexAbstracts during remote searches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	96c6e4e322	- enhancements to detailed search page - enhancements to search ranking computation process - removed bugs in postranking git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	4866868c0e	added write cache for LURLs This was necessary to speed up the index receive process during global search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	eee44be602	*) adding an interface for customized blacklist classes - now it's possible to use a customized blacklist engine instead of the default one - this can be done by configuring the property BlackLists.class See: http://www.yacy-forum.de/viewtopic.php?t=2108 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	d2e8e76218	*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler See: http://www.yacy-forum.de/viewtopic.php?t=2541 http://www.yacy-forum.de/viewtopic.php?p=24516 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	ebc2233092	* implemented (finished) class indexRowSetContainer * replaced indexTreeMapContainer by indexRowSetContainer * deleted indexTreeMapContainer and abstract class This is another step to the new database structure git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9183d21f25	renamed new index class to old name git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	c4e922885a	replaced indexURLEntry by new class that uses a kelondroRow.Entry object to store the index entry. This is another step to move to the new database structure. A side effect of this change is, that index storage uses much less RAM space, which affects the index RAM cache. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	58df8b7bbf	a large collection of different changes * mainly for the transition to the new indexing database structure * a bugfix for an endless loop inside kelondroTree iteration * a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice * very strong speed enhancement for url/domain extraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3879a0ecd0	replaced java.net.URL usage by use of new class de.anomic.net.URL This shall be seen as an experiment to exclude all cases where there could be a DNS lookup during URL comparisment. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	671fd9a5c9	work towards new indexing database structure (no effect on current functionality yet) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hermens	d4645062bc	Correct usage of vhost in wget/wput requests: - yacyClient: don't use own .yacyh domain in requests, instead use .yacyh domain of target peer for everything but ranking distribution - natLib: use full hostname instead of just SLD.TLD git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2232 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	4ca0857c0c	*) Index transfer now considers the pause time send by busy peers during index transfer / index distribution See: http://www.yacy-forum.de/viewtopic.php?p=22647#22491 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2205 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	5041d330ce	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	7b3b12888c	refactoring: integrated indexContainer abstraction layer git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2149 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a930be4ba3	refactoring of index management: generalized the index entry git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	82b2bc6932	patch for index-transfer DoS problem see http://www.yacy-forum.de/viewtopic.php?p=21627#21627 note that this function will make the index-transfer functionality void git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a474669338	start with refactoring of index management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	55c5b41bd0	modified kelondroDyn to work better with new object caches (removed own single object cache) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2077 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	26e3216bcc	update to profile fetch behavior git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2076 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	fd7c17e624	added virtual host support: all yacy-to-yacy communication now send the <peer-hexhash>.yacyh virtual domain inside the http 'Host' property field. This shall enable running a yacy peer on a virtual host. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2074 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	60e5aff9fc	some enhancements to the remote crawl trigger git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2030 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	dbe96e6541	added hand-over of search filter and prefer ranking to yacy protocol git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	bd283b8443	fixed bugs: - null pointer exception during startup of a robinson-configured peer - wrong time calculation of default value of re-crawl option git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a469874e3f	added and fixed time-out behaviour during search see also: http://www.yacy-forum.de/viewtopic.php?p=19823#19823 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1986 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	63f39ac7b5	added 3 new crawling steering options: - re-crawl by age of page (enter in minutes) - auto-domain-filter - maximum number of pages per domain NOT YET TESTED! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	1f4412a146	adopted isListed to discussed new behavior as discussed (url, getFile) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago

1 2 3

114 Commits (29fa17bd406675c19b21242212e5b2d53a4a67e9)