Module: Daidai::Deinflector
- Defined in:
- lib/daidai/deinflector.rb
Overview
Rule-based Japanese deinflector: turns an inflected surface form back into its dictionary form(s), naming each inflection along the way (“食べてる” is the progressive of “食べる”). This is the inverse of Daidai’s forward conjugation.
The rule set is ported from Yomitan’s Japanese language transforms (ext/js/language/ja/japanese-transforms.js), vendored as JSON under resources/; the algorithm is a port of Yomitan’s LanguageTransformer. Both are GPL-3.0 — see NOTICE. Unlike Daidai’s forward tables, these rules also cover colloquial contractions (てる, ちゃう, とく, …).
Unlike ‘Daidai.conjugate(word)`, this needs no Sudachi/kabosu — it is pure, offline, string-rule deinflection.
Defined Under Namespace
Classes: Rule, Transform, TransformedText
Constant Summary collapse
- DATA_FILE =
File.("resources/japanese-transforms.json", __dir__)
- LABELS =
Friendly English labels for the deinflection rule names #deinflect emits. The underlying names (ported from Yomitan) are terse and sometimes symbolic (“-いる”, “-て”, “-ます”); these name the grammar instead (“progressive”, “te-form”, “polite”). This is daidai’s curation, not Yomitan data — it is the single source of truth for naming an inflection, so consumers localise these rather than maintain their own map. Keyed by the rule name; see Deinflector.label for the lookup (which falls back to the name itself).
{ "-いる" => "progressive", "-て" => "te-form", "-た" => "past", "-ます" => "polite", "negative" => "negative", "passive" => "passive", "potential" => "potential", "potential or passive" => "potential / passive", "causative" => "causative", "short causative" => "short causative", "volitional" => "volitional", "volitional slang" => "volitional (slang)", "imperative" => "imperative", "continuative" => "continuative", "-たい" => "desiderative (-tai)", "-たら" => "conditional (-tara)", "-たり" => "representative (-tari)", "-ば" => "provisional (-ba)", "-ゃ" => "conditional contraction (-ya)", "-ちゃ" => "contracted (-cha)", "-ちゃう" => "completive (-chau)", "-ちまう" => "completive (-chimau)", "-しまう" => "completive (-shimau)", "-おく" => "preparatory (-oku)", "-そう" => "looks like (-sou)", "-すぎる" => "excessive (-sugiru)", "-過ぎる" => "excessive (-sugiru)", "-なさい" => "polite imperative (-nasai)", "-さ" => "nominalization (-sa)", "-げ" => "appearance (-ge)", "-がる" => "showing signs (-garu)", "-やがる" => "contemptuous (-yagaru)", "-ず" => "negative (-zu)", "-ぬ" => "negative (-nu)", "-ん" => "negative (-n)", "-ざる" => "negative (-zaru)", "-ねば" => "negative conditional (-neba)", "-まい" => "negative volitional (-mai)", "-く" => "adverbial (-ku)", "-き" => "attributive (-ki)", "-む" => "archaic volitional (-mu)", "-んばかり" => "on the verge (-nbakari)", "-んとする" => "intentive (-ntosuru)", "-え" => "slang (-e)", "n-slang" => "n-slang", "imperative negative slang" => "imperative negative (slang)", "kansai-ben" => "kansai dialect" }.freeze
Class Method Summary collapse
-
.deinflect(text) ⇒ Object
Every deinflection candidate for ‘text`, faithful to the transformer: each term the rules can reach, with its named inflection chain.
-
.label(name) ⇒ Object
Friendly English label for a deinflection rule name (the strings in a Deinflection’s #inflections), e.g.
- .reload! ⇒ Object
-
.transform(source_text) ⇒ Object
The raw transformer output (a TransformedText per reachable form, including the identity).
Class Method Details
.deinflect(text) ⇒ Object
Every deinflection candidate for ‘text`, faithful to the transformer: each term the rules can reach, with its named inflection chain. Excludes the trivial zero-transform identity. Callers with a dictionary look up each `term`; callers without one can keep only `dictionary_form?` candidates.
89 90 91 92 93 94 |
# File 'lib/daidai/deinflector.rb', line 89 def deinflect(text) transform(text) .reject { |t| t.trace.empty? } .map { |t| to_deinflection(t) } .uniq { |d| [ d.term, d.inflections ] } end |
.label(name) ⇒ Object
Friendly English label for a deinflection rule name (the strings in a Deinflection’s #inflections), e.g. “-いる” => “progressive”. Falls back to the name itself for anything not in LABELS, so it is always safe to call.
99 100 101 |
# File 'lib/daidai/deinflector.rb', line 99 def label(name) LABELS.fetch(name.to_s, name.to_s) end |
.reload! ⇒ Object
130 131 132 |
# File 'lib/daidai/deinflector.rb', line 130 def reload! @data = @condition_flags = @dictionary_mask = @transforms = @transforms_by_id = nil end |
.transform(source_text) ⇒ Object
The raw transformer output (a TransformedText per reachable form, including the identity). Mirrors Yomitan’s LanguageTransformer#transform.
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/daidai/deinflector.rb', line 105 def transform(source_text) results = [ TransformedText.new(text: source_text, conditions: 0, trace: []) ] i = 0 while i < results.length current = results[i] transforms.each do |transform| next unless transform.heuristic.match?(current.text) transform.rules.each_with_index do |rule, j| next unless conditions_match?(current.conditions, rule.conditions_in) next unless rule.is_inflected.match?(current.text) next if cycle?(current.trace, transform.id, j, current.text) results << TransformedText.new( text: rule.deinflect.call(current.text), conditions: rule.conditions_out, trace: [ { transform: transform.id, rule_index: j, text: current.text } ] + current.trace ) end end i += 1 end results end |