Module: Daidai::Deinflector
- Defined in:
- lib/daidai/deinflector.rb
Overview
Rule-based Japanese deinflector: turns an inflected surface form back into its dictionary form(s), naming each inflection along the way (“食べてる” is the progressive of “食べる”). This is the inverse of Daidai’s forward conjugation.
The rule set is ported from Yomitan’s Japanese language transforms (ext/js/language/ja/japanese-transforms.js), vendored as JSON under resources/; the algorithm is a port of Yomitan’s LanguageTransformer. Both are GPL-3.0 — see NOTICE. Unlike Daidai’s forward tables, these rules also cover colloquial contractions (てる, ちゃう, とく, …).
Unlike ‘Daidai.conjugate(word)`, this needs no Sudachi/kabosu — it is pure, offline, string-rule deinflection.
Defined Under Namespace
Classes: Rule, Transform, TransformedText
Constant Summary collapse
- DATA_FILE =
File.("resources/japanese-transforms.json", __dir__)
Class Method Summary collapse
-
.deinflect(text) ⇒ Object
Every deinflection candidate for ‘text`, faithful to the transformer: each term the rules can reach, with its named inflection chain.
- .reload! ⇒ Object
-
.transform(source_text) ⇒ Object
The raw transformer output (a TransformedText per reachable form, including the identity).
Class Method Details
.deinflect(text) ⇒ Object
Every deinflection candidate for ‘text`, faithful to the transformer: each term the rules can reach, with its named inflection chain. Excludes the trivial zero-transform identity. Callers with a dictionary look up each `term`; callers without one can keep only `dictionary_form?` candidates.
51 52 53 54 55 56 |
# File 'lib/daidai/deinflector.rb', line 51 def deinflect(text) transform(text) .reject { |t| t.trace.empty? } .map { |t| to_deinflection(t) } .uniq { |d| [ d.term, d.inflections ] } end |
.reload! ⇒ Object
85 86 87 |
# File 'lib/daidai/deinflector.rb', line 85 def reload! @data = @condition_flags = @dictionary_mask = @transforms = @transforms_by_id = nil end |
.transform(source_text) ⇒ Object
The raw transformer output (a TransformedText per reachable form, including the identity). Mirrors Yomitan’s LanguageTransformer#transform.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/daidai/deinflector.rb', line 60 def transform(source_text) results = [ TransformedText.new(text: source_text, conditions: 0, trace: []) ] i = 0 while i < results.length current = results[i] transforms.each do |transform| next unless transform.heuristic.match?(current.text) transform.rules.each_with_index do |rule, j| next unless conditions_match?(current.conditions, rule.conditions_in) next unless rule.is_inflected.match?(current.text) next if cycle?(current.trace, transform.id, j, current.text) results << TransformedText.new( text: rule.deinflect.call(current.text), conditions: rule.conditions_out, trace: [ { transform: transform.id, rule_index: j, text: current.text } ] + current.trace ) end end i += 1 end results end |