Class: Kotoshu::Languages::Russian::Tokenizer

Inherits:
Kotoshu::Language::Tokenizer::RussianTokenizer show all
Defined in:
lib/kotoshu/languages/ru/language.rb

Overview

Russian tokenizer with abbreviation handling.

Constant Summary

Constants inherited from Kotoshu::Language::Tokenizer::RussianTokenizer

Kotoshu::Language::Tokenizer::RussianTokenizer::ABBREVIATION_PLACEHOLDERS, Kotoshu::Language::Tokenizer::RussianTokenizer::PLACEHOLDER_RESTORE, Kotoshu::Language::Tokenizer::RussianTokenizer::WORD_SEPARATORS

Method Summary

Methods inherited from Kotoshu::Language::Tokenizer::RussianTokenizer

#tokenize

Methods inherited from Kotoshu::Language::Tokenizer::Base

#normalize, #skip_token?, #tokenize, #tokenize_with_positions, #word_boundary_regex, #word_char?