Class: Kotoshu::Languages::Portuguese::Tokenizer
- Inherits:
-
Kotoshu::Language::Tokenizer::PortugueseTokenizer
- Object
- Kotoshu::Language::Tokenizer::Base
- Kotoshu::Language::Tokenizer::PortugueseTokenizer
- Kotoshu::Languages::Portuguese::Tokenizer
- Defined in:
- lib/kotoshu/languages/pt/language.rb
Overview
Portuguese tokenizer with number and date handling.
Constant Summary
Constants inherited from Kotoshu::Language::Tokenizer::PortugueseTokenizer
Kotoshu::Language::Tokenizer::PortugueseTokenizer::COLON_NUMBERS_PATTERN, Kotoshu::Language::Tokenizer::PortugueseTokenizer::DATE_PATTERN, Kotoshu::Language::Tokenizer::PortugueseTokenizer::DECIMAL_COMMA_PATTERN, Kotoshu::Language::Tokenizer::PortugueseTokenizer::DECIMAL_COMMA_SUBST, Kotoshu::Language::Tokenizer::PortugueseTokenizer::DOTTED_NUMBERS_PATTERN, Kotoshu::Language::Tokenizer::PortugueseTokenizer::DO_NOT_SPLIT, Kotoshu::Language::Tokenizer::PortugueseTokenizer::NON_BREAKING_COLON_SUBST, Kotoshu::Language::Tokenizer::PortugueseTokenizer::NON_BREAKING_DOT_SUBST, Kotoshu::Language::Tokenizer::PortugueseTokenizer::NON_BREAKING_SPACE_SUBST, Kotoshu::Language::Tokenizer::PortugueseTokenizer::SPACED_DECIMAL_PATTERN, Kotoshu::Language::Tokenizer::PortugueseTokenizer::WORD_SEPARATORS
Method Summary
Methods inherited from Kotoshu::Language::Tokenizer::PortugueseTokenizer
Methods inherited from Kotoshu::Language::Tokenizer::Base
#normalize, #skip_token?, #tokenize, #tokenize_with_positions, #word_boundary_regex, #word_char?