Module: Daidai::Deinflector

Defined in:
lib/daidai/deinflector.rb

Overview

Rule-based Japanese deinflector: turns an inflected surface form back into its dictionary form(s), naming each inflection along the way (“食べてる” is the progressive of “食べる”). This is the inverse of Daidai’s forward conjugation.

The rule set is ported from Yomitan’s Japanese language transforms (ext/js/language/ja/japanese-transforms.js), vendored as JSON under resources/; the algorithm is a port of Yomitan’s LanguageTransformer. Both are GPL-3.0 — see NOTICE. Unlike Daidai’s forward tables, these rules also cover colloquial contractions (てる, ちゃう, とく, …).

Unlike ‘Daidai.conjugate(word)`, this needs no Sudachi/kabosu — it is pure, offline, string-rule deinflection.

Defined Under Namespace

Classes: Rule, Transform, TransformedText

Constant Summary collapse

DATA_FILE =
File.expand_path("resources/japanese-transforms.json", __dir__)

Class Method Summary collapse

Class Method Details

.deinflect(text) ⇒ Object

Every deinflection candidate for ‘text`, faithful to the transformer: each term the rules can reach, with its named inflection chain. Excludes the trivial zero-transform identity. Callers with a dictionary look up each `term`; callers without one can keep only `dictionary_form?` candidates.



51
52
53
54
55
56
# File 'lib/daidai/deinflector.rb', line 51

def deinflect(text)
  transform(text)
    .reject { |t| t.trace.empty? }
    .map { |t| to_deinflection(t) }
    .uniq { |d| [ d.term, d.inflections ] }
end

.reload!Object



85
86
87
# File 'lib/daidai/deinflector.rb', line 85

def reload!
  @data = @condition_flags = @dictionary_mask = @transforms = @transforms_by_id = nil
end

.transform(source_text) ⇒ Object

The raw transformer output (a TransformedText per reachable form, including the identity). Mirrors Yomitan’s LanguageTransformer#transform.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/daidai/deinflector.rb', line 60

def transform(source_text)
  results = [ TransformedText.new(text: source_text, conditions: 0, trace: []) ]
  i = 0
  while i < results.length
    current = results[i]
    transforms.each do |transform|
      next unless transform.heuristic.match?(current.text)

      transform.rules.each_with_index do |rule, j|
        next unless conditions_match?(current.conditions, rule.conditions_in)
        next unless rule.is_inflected.match?(current.text)
        next if cycle?(current.trace, transform.id, j, current.text)

        results << TransformedText.new(
          text: rule.deinflect.call(current.text),
          conditions: rule.conditions_out,
          trace: [ { transform: transform.id, rule_index: j, text: current.text } ] + current.trace
        )
      end
    end
    i += 1
  end
  results
end