Class: Kotoshu::Language::Base
- Inherits:
-
Object
- Object
- Kotoshu::Language::Base
- Defined in:
- lib/kotoshu/language/languages/base.rb
Overview
Abstract base class for language implementations.
Uses Template Method pattern to define the interface that all language implementations must follow.
Each language implementation should:
-
Inherit from this class
-
Implement the required template methods
-
Register itself with Language::Registry
Direct Known Subclasses
Kotoshu::Languages::English, Kotoshu::Languages::French, Kotoshu::Languages::German, Kotoshu::Languages::Japanese, Kotoshu::Languages::Portuguese, Kotoshu::Languages::Russian, Kotoshu::Languages::Spanish
Instance Attribute Summary collapse
-
#code ⇒ Object
readonly
Returns the value of attribute code.
-
#name ⇒ Object
readonly
Returns the value of attribute name.
-
#region ⇒ Object
readonly
Returns the value of attribute region.
-
#variant ⇒ Object
readonly
Returns the value of attribute variant.
Class Method Summary collapse
-
.instance ⇒ Base
Get or create singleton instance.
-
.register(code) ⇒ void
Register this language with the registry.
Instance Method Summary collapse
-
#base_code ⇒ String
Get base language code.
-
#base_language? ⇒ Boolean
Check if this is a base language (no region).
-
#compatible_with?(other) ⇒ Boolean
Check if another language is compatible.
-
#default_dictionary_paths ⇒ Array<String>
Get default dictionary paths for this language.
-
#dictionary_class ⇒ Class
Get dictionary class for this language.
-
#encoding ⇒ String
Get character encoding for this language.
-
#full_name ⇒ String
Get full language name with variant.
-
#info ⇒ Hash
Get language info hash.
-
#initialize(code:, name:, variant: nil) ⇒ Base
constructor
Initialize language.
-
#matches_code?(other_code) ⇒ Boolean
Check if this language matches given code.
-
#normalize(text, options = {}) ⇒ String
Normalize text using language-specific normalizer.
-
#normalize_word(word) ⇒ String
Normalize a word for checking.
-
#normalizer ⇒ Normalizer::Base
Get normalizer for this language.
-
#region_code ⇒ String?
Get region code.
-
#rtl? ⇒ Boolean
Check if language uses right-to-left script.
-
#script_type ⇒ Symbol
Get script type for this language.
-
#tokenize(text) ⇒ Array<String>
Tokenize text using language-specific tokenizer.
-
#tokenizer ⇒ Tokenizer::Base
Get tokenizer for this language.
-
#valid_word?(word, dictionary:) ⇒ Boolean
Check if a word is valid in this language.
Constructor Details
#initialize(code:, name:, variant: nil) ⇒ Base
Initialize language.
43 44 45 46 47 48 |
# File 'lib/kotoshu/language/languages/base.rb', line 43 def initialize(code:, name:, variant: nil) @code = code @name = name @variant = variant @region = extract_region(code) end |
Instance Attribute Details
#code ⇒ Object (readonly)
Returns the value of attribute code.
36 37 38 |
# File 'lib/kotoshu/language/languages/base.rb', line 36 def code @code end |
#name ⇒ Object (readonly)
Returns the value of attribute name.
36 37 38 |
# File 'lib/kotoshu/language/languages/base.rb', line 36 def name @name end |
#region ⇒ Object (readonly)
Returns the value of attribute region.
36 37 38 |
# File 'lib/kotoshu/language/languages/base.rb', line 36 def region @region end |
#variant ⇒ Object (readonly)
Returns the value of attribute variant.
36 37 38 |
# File 'lib/kotoshu/language/languages/base.rb', line 36 def variant @variant end |
Class Method Details
.instance ⇒ Base
Get or create singleton instance.
238 239 240 |
# File 'lib/kotoshu/language/languages/base.rb', line 238 def instance @instance ||= new end |
Instance Method Details
#base_code ⇒ String
Get base language code.
201 202 203 |
# File 'lib/kotoshu/language/languages/base.rb', line 201 def base_code code.split("-").first end |
#base_language? ⇒ Boolean
Check if this is a base language (no region).
194 195 196 |
# File 'lib/kotoshu/language/languages/base.rb', line 194 def base_language? !code.include?("-") end |
#compatible_with?(other) ⇒ Boolean
Check if another language is compatible.
Languages are compatible if they share the same base code.
220 221 222 223 224 |
# File 'lib/kotoshu/language/languages/base.rb', line 220 def compatible_with?(other) return false unless other.is_a?(Base) base_code == other.base_code end |
#default_dictionary_paths ⇒ Array<String>
Get default dictionary paths for this language.
Subclasses can override to provide language-specific paths.
85 86 87 |
# File 'lib/kotoshu/language/languages/base.rb', line 85 def default_dictionary_paths [] end |
#dictionary_class ⇒ Class
Get dictionary class for this language.
Subclasses must implement.
76 77 78 |
# File 'lib/kotoshu/language/languages/base.rb', line 76 def dictionary_class raise NotImplementedError, "#{self.class} must implement #dictionary_class" end |
#encoding ⇒ String
Get character encoding for this language.
Default is UTF-8 for all languages.
94 95 96 |
# File 'lib/kotoshu/language/languages/base.rb', line 94 def encoding "UTF-8" end |
#full_name ⇒ String
Get full language name with variant.
185 186 187 188 189 |
# File 'lib/kotoshu/language/languages/base.rb', line 185 def full_name return name unless variant "#{name} (#{variant})" end |
#info ⇒ Hash
Get language info hash.
156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/kotoshu/language/languages/base.rb', line 156 def info { code: code, name: name, variant: variant, region: region, encoding: encoding, rtl?: rtl?, script_type: script_type, dictionary_class: dictionary_class.name } end |
#matches_code?(other_code) ⇒ Boolean
Check if this language matches given code.
Supports base language matching (e.g., “en” matches “en-US”).
175 176 177 178 179 180 |
# File 'lib/kotoshu/language/languages/base.rb', line 175 def matches_code?(other_code) return false if other_code.nil? code == other_code || code.split("-").first == other_code.split("-").first end |
#normalize(text, options = {}) ⇒ String
Normalize text using language-specific normalizer.
129 130 131 |
# File 'lib/kotoshu/language/languages/base.rb', line 129 def normalize(text, = {}) normalizer.normalize(text, ) end |
#normalize_word(word) ⇒ String
Normalize a word for checking.
149 150 151 |
# File 'lib/kotoshu/language/languages/base.rb', line 149 def normalize_word(word) normalizer.normalize_word(word) end |
#normalizer ⇒ Normalizer::Base
Get normalizer for this language.
Subclasses must implement.
66 67 68 |
# File 'lib/kotoshu/language/languages/base.rb', line 66 def normalizer raise NotImplementedError, "#{self.class} must implement #normalizer" end |
#region_code ⇒ String?
Get region code.
208 209 210 211 212 |
# File 'lib/kotoshu/language/languages/base.rb', line 208 def region_code return nil unless code.include?("-") code.split("-", 2).last end |
#rtl? ⇒ Boolean
Check if language uses right-to-left script.
Default is false. Override for Arabic, Hebrew, etc.
103 104 105 |
# File 'lib/kotoshu/language/languages/base.rb', line 103 def rtl? false end |
#script_type ⇒ Symbol
Get script type for this language.
Possible values: :latin, :cyrillic, :arabic, :cjk, :mixed
112 113 114 |
# File 'lib/kotoshu/language/languages/base.rb', line 112 def script_type :latin end |
#tokenize(text) ⇒ Array<String>
Tokenize text using language-specific tokenizer.
120 121 122 |
# File 'lib/kotoshu/language/languages/base.rb', line 120 def tokenize(text) tokenizer.tokenize(text) end |
#tokenizer ⇒ Tokenizer::Base
Get tokenizer for this language.
Subclasses must implement.
56 57 58 |
# File 'lib/kotoshu/language/languages/base.rb', line 56 def tokenizer raise NotImplementedError, "#{self.class} must implement #tokenizer" end |
#valid_word?(word, dictionary:) ⇒ Boolean
Check if a word is valid in this language.
Uses dictionary lookup.
140 141 142 143 |
# File 'lib/kotoshu/language/languages/base.rb', line 140 def valid_word?(word, dictionary:) normalized = normalize_word(word) dictionary.lookup(normalized) end |