Class: Kotoshu::Languages::Spanish::Tokenizer

Inherits:
Kotoshu::Language::Tokenizer::SpanishTokenizer show all
Defined in:
lib/kotoshu/languages/es/language.rb

Overview

Spanish tokenizer with ordinal and decimal handling.

Constant Summary

Constants inherited from Kotoshu::Language::Tokenizer::SpanishTokenizer

Kotoshu::Language::Tokenizer::SpanishTokenizer::DECIMAL_COMMA, Kotoshu::Language::Tokenizer::SpanishTokenizer::DECIMAL_COMMA_PLACEHOLDER, Kotoshu::Language::Tokenizer::SpanishTokenizer::DECIMAL_POINT, Kotoshu::Language::Tokenizer::SpanishTokenizer::DECIMAL_POINT_PLACEHOLDER, Kotoshu::Language::Tokenizer::SpanishTokenizer::DO_NOT_SPLIT, Kotoshu::Language::Tokenizer::SpanishTokenizer::ORDINAL, Kotoshu::Language::Tokenizer::SpanishTokenizer::ORDINAL_PLACEHOLDER, Kotoshu::Language::Tokenizer::SpanishTokenizer::SOFT_HYPHEN, Kotoshu::Language::Tokenizer::SpanishTokenizer::WORD_SEPARATORS

Method Summary

Methods inherited from Kotoshu::Language::Tokenizer::SpanishTokenizer

#tokenize

Methods inherited from Kotoshu::Language::Tokenizer::Base

#normalize, #skip_token?, #tokenize, #tokenize_with_positions, #word_boundary_regex, #word_char?