yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	e310ec5f70	fix posInText ranking calculation to score 0 on no position info + fix Word posInText calc in Tokenizer to start with 1 + test case	9 years ago
reger	51c077f493	adjust the getTopics() and getTopicNavigator() to current useage - move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics) - let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)	9 years ago
reger	cc2d9dd3f1	reactivate the use of included-in-topwords boost in postRanking + changed the postRanking to add one score only if word appears more as one time. + getTopics() unused code block rem'd (save performace)-> routine needs rework !	9 years ago
reger	6801673a07	apply postranking media search boost only on media queries	9 years ago
Michael Peter Christen	079112358c	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	efeb592661	don't do solr optimization, this create high IO load. We should leave this task to solr to do that on it's own instead of forcing it.	9 years ago
reger	4c7a77662a	eleminate dependency on file-extension in storeDocument but use supported mime-type to also support handling of urls w/o corresponding file-extension. For this refactor use of document.getParserObject() to alway return a Parser (for clean logic) and define/move the scraperObject as local var of AbstractParser. Adjust related calls to getParserObject (where actually a scraperObject is wanted). Addionally skip appending url token to parsed text for dht metadata entries (by default returned as result by rwi index).	9 years ago
reger	2910fe35c1	add missing scheduler calc of next exec_date (call of calculateAPIScheduler) - after last_exec_date is altered, next_exec_date should be recalculated - makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)	9 years ago
reger	70d47ae38a	keep scheduler selection by repeat entry from `07311020d4` to allow exec schedule on actual exec event. Iterate on exec date (of advantage after interruption/shutdown) to schedule older or missed events first.	9 years ago
reger	7c3f932e5d	revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)	9 years ago
reger	07311020d4	postpone apicall exec date init until actual call fix for http://mantis.tokeek.de/view.php?id=677 The difference is on scheduling a large number of rss feeds and loading is not finished before shutdown of YaCy. The change makes sure not already loaded RSS will be loaded by the scheduler on next startup.	9 years ago
reger	fcad2d0744	add uses of config constant INDEX_RECEIVE_ALLOW	9 years ago
reger	35a7d57260	update lucenematchversion to current (5.2.0 -> 5.5.0) there should be no need for reindex by the update	9 years ago
Michael Peter Christen	7466d390b2	small refactoring + do not accept too old peers during bootstrap	9 years ago
reger	8d58a48029	remove wrong log line in CrawlSwitchboard + don't allow CrawlSwitchboard to exit application making network param unused	9 years ago
reger	b119ff65be	clean out not used Switchboard variables counter indexedPages, const xstackCrawlSlots	9 years ago
reger	bd8f7c11f5	Use transparent addToCrawler in AutoSearch instead of addToIndex This would likely also be of advantage for RSS import/schedule as following bug-reports suggest http://mantis.tokeek.de/view.php?id=569 http://mantis.tokeek.de/view.php?id=655	9 years ago
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	9 years ago
reger	d0a571bed2	del cytag trail for own index.html (save resource not used by default)	9 years ago
reger	7097dcbdbd	cleanup hack for partial Solr update on multivalued datefields has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
reger	f10ea3c155	clean-out unused SwitchboardConstants	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	6ecc180299	fix rwi doubledom return best (highest) ranking	9 years ago
reger	d9adc2c255	load handler for Transparent Proxy on startup only if feature is activated to save the resources and keep handler chain small if the feature is not used. +add a warning message on settingsack_p page to restart on first activation	9 years ago
Michael Peter Christen	b89465d952	0N - basic dump upload servlet infrastructure, to share index dumps within an experimental new sharing model	9 years ago
Michael Peter Christen	849ab671a9	0n: modified the p2p bootstraping process - rules had been too tight and did not support the re-start of a network with just one principal peer.	9 years ago
Michael Peter Christen	a6bf0b1649	0N - added option to generate index export files for a specific number of minutes in the past and reverted latest change. The export file dump will now contain four data elements: f - first date of index entry write date, l - last date of index write date, n - now-date of index dump time, c - count of numbers inside the dump. '0N' denotes a series of changes which will lead to the opportunity to exchange index data dumps in a way that is needed to integrate ZeroNet index data. This will be based on index dump sharing; that causes this commit.	9 years ago
reger	06d0e2aeb9	result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. - Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).	9 years ago
reger	caf9e98f09	put metadata dc_publisher in corresponding schema field	9 years ago
reger	6f0b073bf3	override detected language (statistic langdetect) only with TLD determided language if langdetect probability is not high. + additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh used by YaCy	9 years ago
reger	535d4bf75f	respect hidden attribute for file and smb directory listing (hidden directories are not listed, effects crawling of local file system)	9 years ago
reger	a6617ad887	expand initRemoteCrawler() to terminate worker threads if called to deactivate remote crawl. On startup we save the resources for remote crawler if disabled. Once started threads are running idle after disable remote crawl. Now threads are terminated to save the resources also while disabeling during runtime. + remove empty class Channels	9 years ago
reger	ed3e16e092	apply remote result count config value to Bookmark Autosearch + prepare to make the widely unused Bookmark feature optional	9 years ago
Ryszard Goń	a98c395023	Add the Autocrawl thread	9 years ago
Ryszard Goń	1728cd30c6	Create autocrawl profiles	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
reger	1af0e9ef74	remove workaround for Solr bug regarding multivalued date fields fixed in 5.4.0 http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
reger	a58d34a4e8	check error URL cache before adding errorDoc to index - del obsolete related switchboardconstant	9 years ago
reger	cd26717ba2	fix low memory status hint (dht-in disabled) http://mantis.tokeek.de/view.php?id=619	9 years ago
sixcooler	dce1cb65c4	Merge remote-tracking branch 'choose_remote_name/master'	9 years ago
reger	6d54eb3d36	skip loading document on crawl start for YMark bookmarks by adding a constructor giving the already loaded document as parameter.	9 years ago
reger	45b9bd8403	adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters, and feeding hyperlinks to webgraph processing.	9 years ago
reger	dec3e6ad96	fix: adjust urlstub for mailto links (skip protocol)	9 years ago
luc	8c4ab9c76b	Added an option to eventually limit size of remote solr documents put to local index. See mantis #626.	9 years ago
reger	28b8bc290a	fix use of NETWORK_SEARCHVERIFY for rwi verification was not used to set the searchevent parameter (done in SearchEventCache.getEvent) - remove unused corresponding QueryParams.filterfailurls param.	9 years ago
reger	020630efd8	remove unused network scanner parameter from queryparameter Search event is not using networkscanner (removed filterscannerfail param always init to false)	9 years ago
luc	ad5586f8f6	Merge branch 'master' of https://github.com/yacy/yacy_search_server	9 years ago
luc	8ebefa4233	Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was failing. Looks like it was broken since Commit `b43811d38c`	9 years ago
reger	cdb8f3b10d	make current ranking score value avail. to search interface / api Update the result score result field with the result queue ranking value to reflect the actual calculated/used score, for rwi & solr stack results. (calc. etc. is unchanged, it's just that result entry carries the latest val as api retrieves the number from it)	9 years ago
Michael Peter Christen	ef8cd80593	fix for npe	9 years ago

1 2 3 4 5 ...

1220 Commits (e310ec5f702154c215139a0e0fc35db2155a3637)