Module: Kotoshu::Language

Defined in:
lib/kotoshu/language.rb,
lib/kotoshu/language/detector.rb,
lib/kotoshu/language/registry.rb,
lib/kotoshu/language/identifier.rb,
lib/kotoshu/language/languages/base.rb,
lib/kotoshu/language/tokenizer/base.rb,
lib/kotoshu/language/normalizer/base.rb,
lib/kotoshu/language/tokenizer/latin_tokenizer.rb,
lib/kotoshu/language/tokenizer/french_tokenizer.rb,
lib/kotoshu/language/tokenizer/german_tokenizer.rb,
lib/kotoshu/language/tokenizer/russian_tokenizer.rb,
lib/kotoshu/language/tokenizer/spanish_tokenizer.rb,
lib/kotoshu/language/tokenizer/japanese_tokenizer.rb,
lib/kotoshu/language/tokenizer/portuguese_tokenizer.rb

Overview

Language module for multi-language support.

Provides language detection, tokenization, and normalization for different languages with proper OOP design.

Examples:

Detect language

Kotoshu::Language.detect("Hello world")  # => "en"

Get language class

lang_class = Kotoshu::Language.get("en-US")

List supported languages

Kotoshu::Language.supported_codes  # => ["de-DE", "en-US", ...]

Defined Under Namespace

Modules: Normalizer, Tokenizer Classes: Base, Detector, LanguageIdentifier, Registry

Class Method Summary collapse

Class Method Details

.detect(text) ⇒ String?

Detect language from text.

Delegates to Detector.

Parameters:

  • text (String)

    Text to analyze

Returns:

  • (String, nil)

    Detected language code



44
45
46
# File 'lib/kotoshu/language.rb', line 44

def detect(text)
  Detector.detect(text)
end

.detect_with_confidence(text) ⇒ Array<String, Float>

Detect with confidence score.

Parameters:

  • text (String)

    Text to analyze

Returns:

  • (Array<String, Float>)

    Language code and confidence



52
53
54
# File 'lib/kotoshu/language.rb', line 52

def detect_with_confidence(text)
  Detector.detect_with_confidence(text)
end

.get(code) ⇒ Class?

Get language class by code.

Delegates to Registry.

Parameters:

  • code (String)

    Language code

Returns:

  • (Class, nil)

    Language class or nil



62
63
64
# File 'lib/kotoshu/language.rb', line 62

def get(code)
  Registry.get(code)
end

.info(code) ⇒ Hash?

Get language info.

Parameters:

  • code (String)

    Language code

Returns:

  • (Hash, nil)

    Language info or nil



85
86
87
# File 'lib/kotoshu/language.rb', line 85

def info(code)
  Registry.info(code)
end

.register(code, klass) ⇒ void

This method returns an undefined value.

Register a language.

Parameters:

  • code (String)

    Language code

  • klass (Class)

    Language class



94
95
96
# File 'lib/kotoshu/language.rb', line 94

def register(code, klass)
  Registry.register(code, klass)
end

.registered?(code) ⇒ Boolean

Check if language is registered.

Parameters:

  • code (String)

    Language code

Returns:

  • (Boolean)

    True if registered



70
71
72
# File 'lib/kotoshu/language.rb', line 70

def registered?(code)
  Registry.registered?(code)
end

.supported_codesArray<String>

Get all supported language codes.

Returns:

  • (Array<String>)

    List of codes



77
78
79
# File 'lib/kotoshu/language.rb', line 77

def supported_codes
  Registry.supported_codes
end