You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
128 lines
1.8 KiB
128 lines
1.8 KiB
#
|
|
# This file defines a stopword set for Japanese.
|
|
#
|
|
# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
|
|
# Punctuation characters and frequent kanji have mostly been left out. See LUCENE-3745
|
|
# for frequency lists, etc. that can be useful for making your own set (if desired)
|
|
#
|
|
# Note that there is an overlap between these stopwords and the terms stopped when used
|
|
# in combination with the JapanesePartOfSpeechStopFilter. When editing this file, note
|
|
# that comments are not allowed on the same line as stopwords.
|
|
#
|
|
# Also note that stopping is done in a case-insensitive manner. Change your StopFilter
|
|
# configuration if you need case-sensitive stopping. Lastly, note that stopping is done
|
|
# using the same character width as the entries in this file. Since this StopFilter is
|
|
# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
|
|
# entries to be in half-width and your kana entries to be in full-width.
|
|
#
|
|
の
|
|
に
|
|
は
|
|
を
|
|
た
|
|
が
|
|
で
|
|
て
|
|
と
|
|
し
|
|
れ
|
|
さ
|
|
ある
|
|
いる
|
|
も
|
|
する
|
|
から
|
|
な
|
|
こと
|
|
として
|
|
い
|
|
や
|
|
れる
|
|
など
|
|
なっ
|
|
ない
|
|
この
|
|
ため
|
|
その
|
|
あっ
|
|
よう
|
|
また
|
|
もの
|
|
という
|
|
あり
|
|
まで
|
|
られ
|
|
なる
|
|
へ
|
|
か
|
|
だ
|
|
これ
|
|
によって
|
|
により
|
|
おり
|
|
より
|
|
による
|
|
ず
|
|
なり
|
|
られる
|
|
において
|
|
ば
|
|
なかっ
|
|
なく
|
|
しかし
|
|
について
|
|
せ
|
|
だっ
|
|
その後
|
|
できる
|
|
それ
|
|
う
|
|
ので
|
|
なお
|
|
のみ
|
|
でき
|
|
き
|
|
つ
|
|
における
|
|
および
|
|
いう
|
|
さらに
|
|
でも
|
|
ら
|
|
たり
|
|
その他
|
|
に関する
|
|
たち
|
|
ます
|
|
ん
|
|
なら
|
|
に対して
|
|
特に
|
|
せる
|
|
及び
|
|
これら
|
|
とき
|
|
では
|
|
にて
|
|
ほか
|
|
ながら
|
|
うち
|
|
そして
|
|
とともに
|
|
ただし
|
|
かつて
|
|
それぞれ
|
|
または
|
|
お
|
|
ほど
|
|
ものの
|
|
に対する
|
|
ほとんど
|
|
と共に
|
|
といった
|
|
です
|
|
とも
|
|
ところ
|
|
ここ
|
|
##### End of file
|