Class: Kotoshu::Languages::Russian::Tokenizer
- Inherits:
-
Kotoshu::Language::Tokenizer::RussianTokenizer
- Object
- Kotoshu::Language::Tokenizer::Base
- Kotoshu::Language::Tokenizer::RussianTokenizer
- Kotoshu::Languages::Russian::Tokenizer
- Defined in:
- lib/kotoshu/languages/ru/language.rb
Overview
Russian tokenizer with abbreviation handling.
Constant Summary
Constants inherited from Kotoshu::Language::Tokenizer::RussianTokenizer
Kotoshu::Language::Tokenizer::RussianTokenizer::ABBREVIATION_PLACEHOLDERS, Kotoshu::Language::Tokenizer::RussianTokenizer::PLACEHOLDER_RESTORE, Kotoshu::Language::Tokenizer::RussianTokenizer::WORD_SEPARATORS
Method Summary
Methods inherited from Kotoshu::Language::Tokenizer::RussianTokenizer
Methods inherited from Kotoshu::Language::Tokenizer::Base
#normalize, #skip_token?, #tokenize, #tokenize_with_positions, #word_boundary_regex, #word_char?