Class: Kotoshu::Language::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/language/languages/base.rb

Overview

Abstract base class for language implementations.

Uses Template Method pattern to define the interface that all language implementations must follow.

Each language implementation should:

  1. Inherit from this class

  2. Implement the required template methods

  3. Register itself with Language::Registry

Examples:

Implement a language

class English < Kotoshu::Language::Base
  register "en"

  def initialize
    super(code: "en", name: "English")
  end

  def tokenizer
    @tokenizer ||= Tokenizer::LatinTokenizer.new
  end

  def normalizer
    @normalizer ||= Normalizer::Base.new
  end

  def dictionary_class
    Dictionary::UnixWords
  end
end

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(code:, name:, variant: nil) ⇒ Base

Initialize language.

Parameters:

  • code (String)

    Language code (e.g., “en”, “en-US”, “de-DE”)

  • name (String)

    Human-readable name

  • variant (String, nil) (defaults to: nil)

    Variant name (e.g., “American”, “British”)



43
44
45
46
47
48
# File 'lib/kotoshu/language/languages/base.rb', line 43

def initialize(code:, name:, variant: nil)
  @code = code
  @name = name
  @variant = variant
  @region = extract_region(code)
end

Instance Attribute Details

#codeObject (readonly)

Returns the value of attribute code.



36
37
38
# File 'lib/kotoshu/language/languages/base.rb', line 36

def code
  @code
end

#nameObject (readonly)

Returns the value of attribute name.



36
37
38
# File 'lib/kotoshu/language/languages/base.rb', line 36

def name
  @name
end

#regionObject (readonly)

Returns the value of attribute region.



36
37
38
# File 'lib/kotoshu/language/languages/base.rb', line 36

def region
  @region
end

#variantObject (readonly)

Returns the value of attribute variant.



36
37
38
# File 'lib/kotoshu/language/languages/base.rb', line 36

def variant
  @variant
end

Class Method Details

.instanceBase

Get or create singleton instance.

Returns:

  • (Base)

    Language instance



238
239
240
# File 'lib/kotoshu/language/languages/base.rb', line 238

def instance
  @instance ||= new
end

.register(code) ⇒ void

This method returns an undefined value.

Register this language with the registry.

Parameters:

  • code (String)

    Language code



231
232
233
# File 'lib/kotoshu/language/languages/base.rb', line 231

def register(code)
  Kotoshu::Language::Registry.register(code, self)
end

Instance Method Details

#base_codeString

Get base language code.

Returns:

  • (String)

    Base language code (e.g., “en” from “en-US”)



201
202
203
# File 'lib/kotoshu/language/languages/base.rb', line 201

def base_code
  code.split("-").first
end

#base_language?Boolean

Check if this is a base language (no region).

Returns:

  • (Boolean)

    True if base language



194
195
196
# File 'lib/kotoshu/language/languages/base.rb', line 194

def base_language?
  !code.include?("-")
end

#compatible_with?(other) ⇒ Boolean

Check if another language is compatible.

Languages are compatible if they share the same base code.

Parameters:

  • other (Base)

    Other language

Returns:

  • (Boolean)

    True if compatible



220
221
222
223
224
# File 'lib/kotoshu/language/languages/base.rb', line 220

def compatible_with?(other)
  return false unless other.is_a?(Base)

  base_code == other.base_code
end

#default_dictionary_pathsArray<String>

Get default dictionary paths for this language.

Subclasses can override to provide language-specific paths.

Returns:

  • (Array<String>)

    List of dictionary paths to search



85
86
87
# File 'lib/kotoshu/language/languages/base.rb', line 85

def default_dictionary_paths
  []
end

#dictionary_classClass

Get dictionary class for this language.

Subclasses must implement.

Returns:

  • (Class)

    Dictionary backend class

Raises:

  • (NotImplementedError)

    If not implemented



76
77
78
# File 'lib/kotoshu/language/languages/base.rb', line 76

def dictionary_class
  raise NotImplementedError, "#{self.class} must implement #dictionary_class"
end

#encodingString

Get character encoding for this language.

Default is UTF-8 for all languages.

Returns:

  • (String)

    Character encoding name



94
95
96
# File 'lib/kotoshu/language/languages/base.rb', line 94

def encoding
  "UTF-8"
end

#full_nameString

Get full language name with variant.

Returns:

  • (String)

    Full name



185
186
187
188
189
# File 'lib/kotoshu/language/languages/base.rb', line 185

def full_name
  return name unless variant

  "#{name} (#{variant})"
end

#infoHash

Get language info hash.

Returns:

  • (Hash)

    Language information



156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/kotoshu/language/languages/base.rb', line 156

def info
  {
    code: code,
    name: name,
    variant: variant,
    region: region,
    encoding: encoding,
    rtl?: rtl?,
    script_type: script_type,
    dictionary_class: dictionary_class.name
  }
end

#matches_code?(other_code) ⇒ Boolean

Check if this language matches given code.

Supports base language matching (e.g., “en” matches “en-US”).

Parameters:

  • other_code (String)

    Code to compare

Returns:

  • (Boolean)

    True if matches



175
176
177
178
179
180
# File 'lib/kotoshu/language/languages/base.rb', line 175

def matches_code?(other_code)
  return false if other_code.nil?

  code == other_code ||
    code.split("-").first == other_code.split("-").first
end

#normalize(text, options = {}) ⇒ String

Normalize text using language-specific normalizer.

Parameters:

  • text (String)

    Text to normalize

  • options (Hash) (defaults to: {})

    Normalization options

Returns:

  • (String)

    Normalized text



129
130
131
# File 'lib/kotoshu/language/languages/base.rb', line 129

def normalize(text, options = {})
  normalizer.normalize(text, options)
end

#normalize_word(word) ⇒ String

Normalize a word for checking.

Parameters:

  • word (String)

    Word to normalize

Returns:

  • (String)

    Normalized word



149
150
151
# File 'lib/kotoshu/language/languages/base.rb', line 149

def normalize_word(word)
  normalizer.normalize_word(word)
end

#normalizerNormalizer::Base

Get normalizer for this language.

Subclasses must implement.

Returns:

Raises:

  • (NotImplementedError)

    If not implemented



66
67
68
# File 'lib/kotoshu/language/languages/base.rb', line 66

def normalizer
  raise NotImplementedError, "#{self.class} must implement #normalizer"
end

#region_codeString?

Get region code.

Returns:

  • (String, nil)

    Region code or nil



208
209
210
211
212
# File 'lib/kotoshu/language/languages/base.rb', line 208

def region_code
  return nil unless code.include?("-")

  code.split("-", 2).last
end

#rtl?Boolean

Check if language uses right-to-left script.

Default is false. Override for Arabic, Hebrew, etc.

Returns:

  • (Boolean)

    True if RTL



103
104
105
# File 'lib/kotoshu/language/languages/base.rb', line 103

def rtl?
  false
end

#script_typeSymbol

Get script type for this language.

Possible values: :latin, :cyrillic, :arabic, :cjk, :mixed

Returns:

  • (Symbol)

    Script type



112
113
114
# File 'lib/kotoshu/language/languages/base.rb', line 112

def script_type
  :latin
end

#tokenize(text) ⇒ Array<String>

Tokenize text using language-specific tokenizer.

Parameters:

  • text (String)

    Text to tokenize

Returns:

  • (Array<String>)

    Array of tokens



120
121
122
# File 'lib/kotoshu/language/languages/base.rb', line 120

def tokenize(text)
  tokenizer.tokenize(text)
end

#tokenizerTokenizer::Base

Get tokenizer for this language.

Subclasses must implement.

Returns:

Raises:

  • (NotImplementedError)

    If not implemented



56
57
58
# File 'lib/kotoshu/language/languages/base.rb', line 56

def tokenizer
  raise NotImplementedError, "#{self.class} must implement #tokenizer"
end

#valid_word?(word, dictionary:) ⇒ Boolean

Check if a word is valid in this language.

Uses dictionary lookup.

Parameters:

  • word (String)

    Word to check

  • dictionary (Dictionary::Base)

    Dictionary to use

Returns:

  • (Boolean)

    True if word is valid



140
141
142
143
# File 'lib/kotoshu/language/languages/base.rb', line 140

def valid_word?(word, dictionary:)
  normalized = normalize_word(word)
  dictionary.lookup(normalized)
end