Michael Peter Christen
7466d390b2
small refactoring + do not accept too old peers during bootstrap
9 years ago
reger
8d58a48029
remove wrong log line in CrawlSwitchboard
...
+ don't allow CrawlSwitchboard to exit application
making network param unused
9 years ago
reger
b119ff65be
clean out not used Switchboard variables
...
counter indexedPages, const xstackCrawlSlots
9 years ago
reger
bd8f7c11f5
Use transparent addToCrawler in AutoSearch instead of addToIndex
...
This would likely also be of advantage for RSS import/schedule as
following bug-reports suggest
http://mantis.tokeek.de/view.php?id=569
http://mantis.tokeek.de/view.php?id=655
9 years ago
JeremyRand
433217b33e
Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
9 years ago
reger
d0a571bed2
del cytag trail for own index.html (save resource not used by default)
9 years ago
reger
7097dcbdbd
cleanup hack for partial Solr update on multivalued datefields
...
has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
reger
f10ea3c155
clean-out unused SwitchboardConstants
9 years ago
reger
ef24593347
delete obsolete SEARCHRESULT busythread constants
...
not used since 29.05.2013 18:27:27
0c1a018bbd
9 years ago
reger
6ecc180299
fix rwi doubledom return best (highest) ranking
9 years ago
reger
d9adc2c255
load handler for Transparent Proxy on startup only if feature is activated
...
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
9 years ago
Michael Peter Christen
b89465d952
0N - basic dump upload servlet infrastructure, to share index dumps
...
within an experimental new sharing model
9 years ago
Michael Peter Christen
849ab671a9
0n: modified the p2p bootstraping process - rules had been too tight and
...
did not support the re-start of a network with just one principal peer.
9 years ago
Michael Peter Christen
a6bf0b1649
0N - added option to generate index export files for a specific number
...
of minutes in the past and reverted latest change. The export file dump
will now contain four data elements: f - first date of index entry write
date, l - last date of index write date, n - now-date of index dump
time, c - count of numbers inside the dump. '0N' denotes a series of
changes which will lead to the opportunity to exchange index data dumps
in a way that is needed to integrate ZeroNet index data. This will be
based on index dump sharing; that causes this commit.
9 years ago
reger
06d0e2aeb9
result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
...
- Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).
9 years ago
reger
caf9e98f09
put metadata dc_publisher in corresponding schema field
9 years ago
reger
6f0b073bf3
override detected language (statistic langdetect) only with TLD determided
...
language if langdetect probability is not high.
+ additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh
used by YaCy
9 years ago
reger
535d4bf75f
respect hidden attribute for file and smb directory listing
...
(hidden directories are not listed, effects crawling of local file system)
9 years ago
reger
a6617ad887
expand initRemoteCrawler() to terminate worker threads if called to deactivate
...
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
9 years ago
reger
ed3e16e092
apply remote result count config value to Bookmark Autosearch
...
+ prepare to make the widely unused Bookmark feature optional
9 years ago
Ryszard Goń
a98c395023
Add the Autocrawl thread
9 years ago
Ryszard Goń
1728cd30c6
Create autocrawl profiles
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
reger
1af0e9ef74
remove workaround for Solr bug regarding multivalued date fields
...
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
9 years ago
reger
a58d34a4e8
check error URL cache before adding errorDoc to index
...
- del obsolete related switchboardconstant
9 years ago
reger
cd26717ba2
fix low memory status hint (dht-in disabled)
...
http://mantis.tokeek.de/view.php?id=619
9 years ago
sixcooler
dce1cb65c4
Merge remote-tracking branch 'choose_remote_name/master'
9 years ago
reger
6d54eb3d36
skip loading document on crawl start for YMark bookmarks
...
by adding a constructor giving the already loaded document as parameter.
9 years ago
reger
45b9bd8403
adjust MultiProtocolURL.protocol detection to handle mailto with "://" in parameters,
...
and feeding hyperlinks to webgraph processing.
9 years ago
reger
dec3e6ad96
fix: adjust urlstub for mailto links
...
(skip protocol)
9 years ago
luc
8c4ab9c76b
Added an option to eventually limit size of remote solr documents put to
...
local index. See mantis #626 .
9 years ago
reger
28b8bc290a
fix use of NETWORK_SEARCHVERIFY for rwi verification
...
was not used to set the searchevent parameter (done in SearchEventCache.getEvent)
- remove unused corresponding QueryParams.filterfailurls param.
9 years ago
reger
020630efd8
remove unused network scanner parameter from queryparameter
...
Search event is not using networkscanner
(removed filterscannerfail param always init to false)
9 years ago
luc
ad5586f8f6
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
luc
8ebefa4233
Fixed MediaWiki import : DCEntry conversion to SolrInputDocument was
...
failing. Looks like it was broken since Commit
b43811d38c
9 years ago
reger
cdb8f3b10d
make current ranking score value avail. to search interface / api
...
Update the result score result field with the result queue ranking value to reflect
the actual calculated/used score,
for rwi & solr stack results.
(calc. etc. is unchanged, it's just that result entry carries the latest val
as api retrieves the number from it)
9 years ago
Michael Peter Christen
ef8cd80593
fix for npe
9 years ago
reger
0347bfa71f
Apply collection query constraint/modifiert to rwi result stack.
...
Collection is not available in pure rwi entries (but in local solr metadata)
But if user wishes to filter by query constraint also rwi shall adhere to this
(even if only rwi entries with parsed or solr received metadata may fit)
9 years ago
reger
ca3d26a401
harmonize wordsintitle & CollectionSchema.title_words_val calculation,
...
remove obsolete partial init of wordreference from urimetadata
9 years ago
reger
52a9040ae6
Sort out double keywords (dc_subject) early in parsed documents
...
- by direct using Set vs. List
- remove not neede String[] getter
9 years ago
sixcooler
646afe9183
do not store subfield *_coordinate + make all num-fields being docvalues
9 years ago
sixcooler
194df613de
not using 'location' as defaultfacetfield - since we removed it being
...
default.
9 years ago
sixcooler
4a905ec134
fix to not let the AccessTracker-Log grow to much, but have enough data
...
to monitor.
(+gitignore-correction)
9 years ago
reger
a60b1fb6c2
differentiate api call getLocalPort() from getConfigInt()
9 years ago
reger
11f3666660
increase use of pre.defined CATCHALL_QUERY string
9 years ago
reger
a58ee49307
Optimize internal imagequery focus on using content_type to select images
...
(in favor of url file extension)
9 years ago
Michael Peter Christen
151ccd50a9
fix for image size field values (must be multi-valued)
10 years ago
reger
43c27aa550
upd to solr/lucene 5.3.1
10 years ago
Michael Peter Christen
3d7dd9d3aa
follow-up to latest commit: also flush the search cache if all crawls
...
had been terminated.
10 years ago
Michael Peter Christen
c737ff235d
in case that the include_string contains several entries including
...
1-char tokens and also more-than-1-char tokens, then remove the 1-char
tokens to prevent that we are to strict. This will make it possible to
be a bit more fuzzy in the search where it is appropriate.
10 years ago