Class: Kotoshu::Languages::French::POSTagger
- Inherits:
-
Components::PosTagger
- Object
- Components::PosTagger
- Kotoshu::Languages::French::POSTagger
- Defined in:
- lib/kotoshu/languages/fr/language.rb
Overview
French POS tagger.
Derives POS tags from Hunspell flags using French-specific mappings.
Constant Summary collapse
- FLAG_TO_POS =
French POS flag mappings based on Hunspell French dictionaries
{ # Nouns 'N' => 'NOUN', 'NN' => 'NOUN', 'NNS' => 'NOUN', 'NNP' => 'NOUN_PROPER', # Verbs 'V' => 'VERB', 'VB' => 'VERB', 'VBD' => 'VERB', 'VBG' => 'VERB', 'VBN' => 'VERB', 'VBP' => 'VERB', 'VBZ' => 'VERB', # Adjectives 'A' => 'ADJ', 'JJ' => 'ADJ', 'JJR' => 'ADJ', 'JJS' => 'ADJ', # Adverbs 'R' => 'ADV', 'RB' => 'ADV', 'RBR' => 'ADV', 'RBS' => 'ADV', # Determiners 'D' => 'DET', 'DT' => 'DET', 'PDT' => 'DET', # Pronouns 'P' => 'PRON', 'PP' => 'PRON', 'PRP' => 'PRON', 'PRP$' => 'PRON_POSS', 'WP' => 'PRON', 'WP$' => 'PRON_POSS', # Prepositions 'I' => 'PREP', 'IN' => 'PREP', # Conjunctions 'C' => 'CONJ', 'CC' => 'CONJ', # Particles 'U' => 'PART', 'RP' => 'PART', # Interjections 'INTJ' => 'INTJ', 'UH' => 'INTJ', # Numbers 'CD' => 'NUM', # Foreign words 'FW' => 'X', # Punctuation 'PUNCT' => 'PUNCT', '.' => 'PUNCT', ',' => 'PUNCT', '!' => 'PUNCT', '?' => 'PUNCT', ';' => 'PUNCT', ':' => 'PUNCT' }.freeze
Instance Attribute Summary collapse
-
#aff_path ⇒ Object
readonly
Returns the value of attribute aff_path.
-
#dic_path ⇒ Object
readonly
Returns the value of attribute dic_path.
-
#script ⇒ Object
readonly
Returns the value of attribute script.
Instance Method Summary collapse
- #clear_cache ⇒ Object
- #flag_mapping ⇒ Object
- #flag_mapping=(mapping) ⇒ Object
-
#initialize(aff_path:, dic_path:, script: :latin, encoding: 'UTF-8', flag_mapping: FLAG_TO_POS) ⇒ POSTagger
constructor
A new instance of POSTagger.
- #tag(tokens) ⇒ Object
Methods inherited from Components::PosTagger
Constructor Details
#initialize(aff_path:, dic_path:, script: :latin, encoding: 'UTF-8', flag_mapping: FLAG_TO_POS) ⇒ POSTagger
Returns a new instance of POSTagger.
197 198 199 200 201 202 203 204 205 |
# File 'lib/kotoshu/languages/fr/language.rb', line 197 def initialize(aff_path:, dic_path:, script: :latin, encoding: 'UTF-8', flag_mapping: FLAG_TO_POS) @aff_path = aff_path @dic_path = dic_path @script = script @encoding = encoding @flag_mapping = flag_mapping @lookuper = Readers::LookupBuilder.new(aff_path, dic_path, encoding: encoding, script: script).build @lookup_cache = {} end |
Instance Attribute Details
#aff_path ⇒ Object (readonly)
Returns the value of attribute aff_path.
195 196 197 |
# File 'lib/kotoshu/languages/fr/language.rb', line 195 def aff_path @aff_path end |
#dic_path ⇒ Object (readonly)
Returns the value of attribute dic_path.
195 196 197 |
# File 'lib/kotoshu/languages/fr/language.rb', line 195 def dic_path @dic_path end |
#script ⇒ Object (readonly)
Returns the value of attribute script.
195 196 197 |
# File 'lib/kotoshu/languages/fr/language.rb', line 195 def script @script end |
Instance Method Details
#clear_cache ⇒ Object
228 229 230 |
# File 'lib/kotoshu/languages/fr/language.rb', line 228 def clear_cache @lookup_cache.clear end |
#flag_mapping ⇒ Object
220 221 222 |
# File 'lib/kotoshu/languages/fr/language.rb', line 220 def flag_mapping @flag_mapping end |
#flag_mapping=(mapping) ⇒ Object
224 225 226 |
# File 'lib/kotoshu/languages/fr/language.rb', line 224 def flag_mapping=(mapping) @flag_mapping = mapping end |
#tag(tokens) ⇒ Object
207 208 209 210 211 212 213 214 215 216 217 218 |
# File 'lib/kotoshu/languages/fr/language.rb', line 207 def tag(tokens) return [] if tokens.nil? || tokens.empty? tokens.map do |token| word = token[:token] if word.nil? || word.empty? token.merge(pos_tag: nil, lemma: nil) else lookup_result = lookup_with_pos(word) token.merge(pos_tag: lookup_result[:pos_tag], lemma: lookup_result[:lemma] || word) end end end |