Class: Kotoshu::Components::Synthesizer Abstract
- Inherits:
-
Object
- Object
- Kotoshu::Components::Synthesizer
- Defined in:
- lib/kotoshu/components/synthesizer.rb
Overview
Subclasses must implement #synthesize
Base class for word form synthesizers.
Synthesizers generate inflected forms from a lemma (base form). This is the inverse of lemmatization:
-
Lemmatization: “runs” → “run”
-
Synthesis: “run” → [“run”, “runs”, “running”, “ran”]
Different languages use different synthesis strategies:
-
Latin scripts: Hunspell affix rules
-
CJK: Not applicable (no inflection)
-
German: Compound word + affix synthesis
-
Finnish: Complex agglutinative patterns
Instance Method Summary collapse
-
#synthesize(lemma, pos_tag) ⇒ Array<String>
abstract
Generate inflected forms of a word.
-
#synthesize_all(lemma) ⇒ Hash
Generate all inflected forms (all POS tags).
Instance Method Details
#synthesize(lemma, pos_tag) ⇒ Array<String>
Subclasses must implement
Generate inflected forms of a word.
Given a lemma (base form) and a POS tag, returns all possible inflected forms of that word.
39 40 41 |
# File 'lib/kotoshu/components/synthesizer.rb', line 39 def synthesize(lemma, pos_tag) raise NotImplementedError, "#{self.class} must implement #synthesize" end |
#synthesize_all(lemma) ⇒ Hash
Generate all inflected forms (all POS tags).
Convenience method that generates forms for all possible POS tags.
49 50 51 52 53 54 55 56 57 |
# File 'lib/kotoshu/components/synthesizer.rb', line 49 def synthesize_all(lemma) # Default implementation - subclasses can optimize { 'NOUN' => synthesize(lemma, 'NOUN'), 'VERB' => synthesize(lemma, 'VERB'), 'ADJ' => synthesize(lemma, 'ADJ'), 'ADV' => synthesize(lemma, 'ADV') } end |