Module: Daidai::Kabosu

Defined in:
lib/daidai/kabosu.rb

Overview

Optional resolver backed by the ‘kabosu` gem (Ruby bindings for the Sudachi morphological analyzer). Turns a bare word — even an inflected one like “食べている” — into its dictionary form and JMdict part of speech, so you can conjugate without naming the POS yourself:

Daidai.conjugate("食べている")   # kabosu finds 食べる / v1, then conjugates

This needs the ‘kabosu` gem plus an installed Sudachi dictionary. Neither is a hard dependency of daidai — the rest of the gem is pure Ruby and zero-dependency. The escape hatch is simply to pass the POS, in which case kabosu never runs:

Daidai.conjugate("食べる", "v1")

NOTE: this module is nested inside Daidai, so the top-level kabosu gem must be referenced as ::Kabosu to avoid resolving back to Daidai::Kabosu.

Defined Under Namespace

Classes: MissingDependency

Constant Summary collapse

CONJUGATION_TYPE =

Sudachi 活用型 (conjugation type) => JMdict POS code. Sudachi names the verb row but not the JMdict subclass for a handful of irregulars, so LEMMA_POS overrides those by dictionary form.

{
  "五段-カ行" => "v5k", "五段-ガ行" => "v5g", "五段-サ行" => "v5s",
  "五段-タ行" => "v5t", "五段-ナ行" => "v5n", "五段-バ行" => "v5b",
  "五段-マ行" => "v5m", "五段-ラ行" => "v5r", "五段-ワア行" => "v5u",
  "カ行変格" => "vk", "サ行変格" => "vs-i"
}.freeze
LEMMA_POS =

Dictionary-form overrides for verbs whose JMdict subclass Sudachi’s 活用型can’t distinguish (irregular okurigana inside an otherwise-regular row).

{
  "行く" => "v5k-s", "逝く" => "v5k-s", "往く" => "v5k-s",
  "有る" => "v5r-i", "在る" => "v5r-i", "ある" => "v5r-i"
}.freeze

Class Method Summary collapse

Class Method Details

.available?Boolean

Whether the resolver is usable (kabosu loadable + a dictionary present).

Returns:

  • (Boolean)


72
73
74
75
76
# File 'lib/daidai/kabosu.rb', line 72

def available?
  !tokenizer.nil?
rescue MissingDependency
  false
end

.jmdict_pos(pos, lemma) ⇒ Object

Pure mapping: a Sudachi part-of-speech array + dictionary form => JMdict POS code, or nil. Exposed (and unit-tested) without needing kabosu.



67
68
69
# File 'lib/daidai/kabosu.rb', line 67

def jmdict_pos(pos, lemma)
  LEMMA_POS[lemma] || from_conjugation_type(pos)
end

.reset!Object



78
# File 'lib/daidai/kabosu.rb', line 78

def reset! = (@tokenizer = nil)

.resolve(text) ⇒ Object

Resolve ‘text` to { word:, pos:, reading: } from its first inflecting morpheme, or nil when nothing conjugatable is found. Raises MissingDependency when kabosu/a dictionary isn’t installed.



48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/daidai/kabosu.rb', line 48

def resolve(text)
  morphemes = tokenizer.tokenize(text).to_a
  index = morphemes.index { |m| inflecting?(m.part_of_speech) }
  return nil unless index

  morpheme = morphemes[index]
  preceding = index.positive? ? morphemes[index - 1] : nil

  # 名詞+する compounds (勉強した → 勉強, vs): the noun is the dictionary entry.
  if suru?(morpheme.part_of_speech) && preceding && suru_noun?(preceding.part_of_speech)
    return entry(preceding, "vs")
  end

  pos = jmdict_pos(morpheme.part_of_speech, morpheme.dictionary_form)
  pos && entry(morpheme, pos)
end