Module: Daidai::Deinflector

Defined in:
lib/daidai/deinflector.rb

Overview

Rule-based Japanese deinflector: turns an inflected surface form back into its dictionary form(s), naming each inflection along the way (“食べてる” is the progressive of “食べる”). This is the inverse of Daidai’s forward conjugation.

The rule set is ported from Yomitan’s Japanese language transforms (ext/js/language/ja/japanese-transforms.js), vendored as JSON under resources/; the algorithm is a port of Yomitan’s LanguageTransformer. Both are GPL-3.0 — see NOTICE. Unlike Daidai’s forward tables, these rules also cover colloquial contractions (てる, ちゃう, とく, …).

Unlike ‘Daidai.conjugate(word)`, this needs no Sudachi/kabosu — it is pure, offline, string-rule deinflection.

Defined Under Namespace

Classes: Rule, Transform, TransformedText

Constant Summary collapse

DATA_FILE =
File.expand_path("resources/japanese-transforms.json", __dir__)
LABELS =

Friendly English labels for the deinflection rule names #deinflect emits. The underlying names (ported from Yomitan) are terse and sometimes symbolic (“-いる”, “-て”, “-ます”); these name the grammar instead (“progressive”, “te-form”, “polite”). This is daidai’s curation, not Yomitan data — it is the single source of truth for naming an inflection, so consumers localise these rather than maintain their own map. Keyed by the rule name; see Deinflector.label for the lookup (which falls back to the name itself).

{
  "-いる" => "progressive", "-て" => "te-form", "-た" => "past",
  "-ます" => "polite", "negative" => "negative", "passive" => "passive",
  "potential" => "potential", "potential or passive" => "potential / passive",
  "causative" => "causative", "short causative" => "short causative",
  "volitional" => "volitional", "volitional slang" => "volitional (slang)",
  "imperative" => "imperative", "continuative" => "continuative",
  "-たい" => "desiderative (-tai)", "-たら" => "conditional (-tara)",
  "-たり" => "representative (-tari)", "-ば" => "provisional (-ba)",
  "-ゃ" => "conditional contraction (-ya)", "-ちゃ" => "contracted (-cha)",
  "-ちゃう" => "completive (-chau)", "-ちまう" => "completive (-chimau)",
  "-しまう" => "completive (-shimau)", "-おく" => "preparatory (-oku)",
  "-そう" => "looks like (-sou)", "-すぎる" => "excessive (-sugiru)",
  "-過ぎる" => "excessive (-sugiru)", "-なさい" => "polite imperative (-nasai)",
  "-さ" => "nominalization (-sa)", "-げ" => "appearance (-ge)",
  "-がる" => "showing signs (-garu)", "-やがる" => "contemptuous (-yagaru)",
  "-ず" => "negative (-zu)", "-ぬ" => "negative (-nu)", "-ん" => "negative (-n)",
  "-ざる" => "negative (-zaru)", "-ねば" => "negative conditional (-neba)",
  "-まい" => "negative volitional (-mai)", "-く" => "adverbial (-ku)",
  "-き" => "attributive (-ki)", "-む" => "archaic volitional (-mu)",
  "-んばかり" => "on the verge (-nbakari)", "-んとする" => "intentive (-ntosuru)",
  "-え" => "slang (-e)", "n-slang" => "n-slang",
  "imperative negative slang" => "imperative negative (slang)",
  "kansai-ben" => "kansai dialect"
}.freeze

Class Method Summary collapse

Class Method Details

.deinflect(text) ⇒ Object

Every deinflection candidate for ‘text`, faithful to the transformer: each term the rules can reach, with its named inflection chain. Excludes the trivial zero-transform identity. Callers with a dictionary look up each `term`; callers without one can keep only `dictionary_form?` candidates.



89
90
91
92
93
94
# File 'lib/daidai/deinflector.rb', line 89

def deinflect(text)
  transform(text)
    .reject { |t| t.trace.empty? }
    .map { |t| to_deinflection(t) }
    .uniq { |d| [ d.term, d.inflections ] }
end

.label(name) ⇒ Object

Friendly English label for a deinflection rule name (the strings in a Deinflection’s #inflections), e.g. “-いる” => “progressive”. Falls back to the name itself for anything not in LABELS, so it is always safe to call.



99
100
101
# File 'lib/daidai/deinflector.rb', line 99

def label(name)
  LABELS.fetch(name.to_s, name.to_s)
end

.reload!Object



130
131
132
# File 'lib/daidai/deinflector.rb', line 130

def reload!
  @data = @condition_flags = @dictionary_mask = @transforms = @transforms_by_id = nil
end

.transform(source_text) ⇒ Object

The raw transformer output (a TransformedText per reachable form, including the identity). Mirrors Yomitan’s LanguageTransformer#transform.



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/daidai/deinflector.rb', line 105

def transform(source_text)
  results = [ TransformedText.new(text: source_text, conditions: 0, trace: []) ]
  i = 0
  while i < results.length
    current = results[i]
    transforms.each do |transform|
      next unless transform.heuristic.match?(current.text)

      transform.rules.each_with_index do |rule, j|
        next unless conditions_match?(current.conditions, rule.conditions_in)
        next unless rule.is_inflected.match?(current.text)
        next if cycle?(current.trace, transform.id, j, current.text)

        results << TransformedText.new(
          text: rule.deinflect.call(current.text),
          conditions: rule.conditions_out,
          trace: [ { transform: transform.id, rule_index: j, text: current.text } ] + current.trace
        )
      end
    end
    i += 1
  end
  results
end