Module: Daidai::Kabosu
- Defined in:
- lib/daidai/kabosu.rb
Overview
Optional resolver backed by the ‘kabosu` gem (Ruby bindings for the Sudachi morphological analyzer). Turns a bare word — even an inflected one like “食べている” — into its dictionary form and JMdict part of speech, so you can conjugate without naming the POS yourself:
Daidai.conjugate("食べている") # kabosu finds 食べる / v1, then conjugates
This needs the ‘kabosu` gem plus an installed Sudachi dictionary. Neither is a hard dependency of daidai — the rest of the gem is pure Ruby and zero-dependency. The escape hatch is simply to pass the POS, in which case kabosu never runs:
Daidai.conjugate("食べる", "v1")
NOTE: this module is nested inside Daidai, so the top-level kabosu gem must be referenced as ::Kabosu to avoid resolving back to Daidai::Kabosu.
Defined Under Namespace
Classes: MissingDependency
Constant Summary collapse
- CONJUGATION_TYPE =
Sudachi 活用型 (conjugation type) => JMdict POS code. Sudachi names the verb row but not the JMdict subclass for a handful of irregulars, so LEMMA_POS overrides those by dictionary form.
{ "五段-カ行" => "v5k", "五段-ガ行" => "v5g", "五段-サ行" => "v5s", "五段-タ行" => "v5t", "五段-ナ行" => "v5n", "五段-バ行" => "v5b", "五段-マ行" => "v5m", "五段-ラ行" => "v5r", "五段-ワア行" => "v5u", "カ行変格" => "vk", "サ行変格" => "vs-i" }.freeze
- LEMMA_POS =
Dictionary-form overrides for verbs whose JMdict subclass Sudachi’s 活用型can’t distinguish (irregular okurigana inside an otherwise-regular row).
{ "行く" => "v5k-s", "逝く" => "v5k-s", "往く" => "v5k-s", "有る" => "v5r-i", "在る" => "v5r-i", "ある" => "v5r-i" }.freeze
Class Method Summary collapse
-
.available? ⇒ Boolean
Whether the resolver is usable (kabosu loadable + a dictionary present).
-
.jmdict_pos(pos, lemma) ⇒ Object
Pure mapping: a Sudachi part-of-speech array + dictionary form => JMdict POS code, or nil.
- .reset! ⇒ Object
-
.resolve(text) ⇒ Object
Resolve ‘text` to { word:, pos:, reading: } from its first inflecting morpheme, or nil when nothing conjugatable is found.
Class Method Details
.available? ⇒ Boolean
Whether the resolver is usable (kabosu loadable + a dictionary present).
72 73 74 75 76 |
# File 'lib/daidai/kabosu.rb', line 72 def available? !tokenizer.nil? rescue MissingDependency false end |
.jmdict_pos(pos, lemma) ⇒ Object
Pure mapping: a Sudachi part-of-speech array + dictionary form => JMdict POS code, or nil. Exposed (and unit-tested) without needing kabosu.
67 68 69 |
# File 'lib/daidai/kabosu.rb', line 67 def jmdict_pos(pos, lemma) LEMMA_POS[lemma] || from_conjugation_type(pos) end |
.reset! ⇒ Object
78 |
# File 'lib/daidai/kabosu.rb', line 78 def reset! = (@tokenizer = nil) |
.resolve(text) ⇒ Object
Resolve ‘text` to { word:, pos:, reading: } from its first inflecting morpheme, or nil when nothing conjugatable is found. Raises MissingDependency when kabosu/a dictionary isn’t installed.
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/daidai/kabosu.rb', line 48 def resolve(text) morphemes = tokenizer.tokenize(text).to_a index = morphemes.index { |m| inflecting?(m.part_of_speech) } return nil unless index morpheme = morphemes[index] preceding = index.positive? ? morphemes[index - 1] : nil # 名詞+する compounds (勉強した → 勉強, vs): the noun is the dictionary entry. if suru?(morpheme.part_of_speech) && preceding && suru_noun?(preceding.part_of_speech) return entry(preceding, "vs") end pos = jmdict_pos(morpheme.part_of_speech, morpheme.dictionary_form) pos && entry(morpheme, pos) end |