Class: Kotoshu::Components::Synthesizer Abstract

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/components/synthesizer.rb

Overview

This class is abstract.

Subclasses must implement #synthesize

Base class for word form synthesizers.

Synthesizers generate inflected forms from a lemma (base form). This is the inverse of lemmatization:

  • Lemmatization: “runs” → “run”

  • Synthesis: “run” → [“run”, “runs”, “running”, “ran”]

Different languages use different synthesis strategies:

  • Latin scripts: Hunspell affix rules

  • CJK: Not applicable (no inflection)

  • German: Compound word + affix synthesis

  • Finnish: Complex agglutinative patterns

Examples:

Synthesizing English verb forms

synthesizer = EnglishSynthesizer.new(aff_path: "en_US.aff", dic_path: "en_US.dic")
forms = synthesizer.synthesize("run", "VERB")
# => ["run", "runs", "running", "ran"]

Synthesizing with POS constraint

forms = synthesizer.synthesize("happy", "ADJ")
# => ["happy", "happier", "happiest"]

Instance Method Summary collapse

Instance Method Details

#synthesize(lemma, pos_tag) ⇒ Array<String>

This method is abstract.

Subclasses must implement

Generate inflected forms of a word.

Given a lemma (base form) and a POS tag, returns all possible inflected forms of that word.

Parameters:

  • lemma (String)

    The base form (lemma)

  • pos_tag (String)

    The POS tag to constrain generation

Returns:

  • (Array<String>)

    Array of inflected forms

Raises:

  • (NotImplementedError)

    if not implemented by subclass



39
40
41
# File 'lib/kotoshu/components/synthesizer.rb', line 39

def synthesize(lemma, pos_tag)
  raise NotImplementedError, "#{self.class} must implement #synthesize"
end

#synthesize_all(lemma) ⇒ Hash

Generate all inflected forms (all POS tags).

Convenience method that generates forms for all possible POS tags.

Parameters:

  • lemma (String)

    The base form (lemma)

Returns:

  • (Hash)

    Hash mapping POS tags to arrays of forms



49
50
51
52
53
54
55
56
57
# File 'lib/kotoshu/components/synthesizer.rb', line 49

def synthesize_all(lemma)
  # Default implementation - subclasses can optimize
  {
    'NOUN' => synthesize(lemma, 'NOUN'),
    'VERB' => synthesize(lemma, 'VERB'),
    'ADJ' => synthesize(lemma, 'ADJ'),
    'ADV' => synthesize(lemma, 'ADV')
  }
end