Class: TokenKit::Tokenizer
- Inherits:
-
Object
- Object
- TokenKit::Tokenizer
- Defined in:
- lib/tokenkit.rb
Overview
Instance-based tokenizer for thread-safe tokenization with specific configuration.
Instance Attribute Summary collapse
-
#config ⇒ Configuration
readonly
The tokenizer's configuration.
Instance Method Summary collapse
-
#initialize(config = {}) ⇒ Tokenizer
constructor
Creates a new tokenizer instance with the specified configuration.
-
#tokenize(text) ⇒ Array<String>
Tokenizes the given text using this tokenizer's configuration.
Constructor Details
#initialize(config = {}) ⇒ Tokenizer
Creates a new tokenizer instance with the specified configuration.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/tokenkit.rb', line 72 def initialize(config = {}) @config = if config.is_a?(Configuration) config elsif config.is_a?(ConfigBuilder) config.build elsif config.is_a?(Hash) builder = TokenKit.config_hash.to_builder config.each do |key, value| builder.send("#{key}=", value) if builder.respond_to?("#{key}=") end builder.build else TokenKit.config_hash end end |
Instance Attribute Details
#config ⇒ Configuration (readonly)
Returns The tokenizer's configuration.
55 56 57 |
# File 'lib/tokenkit.rb', line 55 def config @config end |
Instance Method Details
#tokenize(text) ⇒ Array<String>
Tokenizes the given text using this tokenizer's configuration.
98 99 100 |
# File 'lib/tokenkit.rb', line 98 def tokenize(text) TokenKit._tokenize_with_config(text, @config.to_rust_config) end |