Class: S3arch::Tokenizer
- Inherits:
-
Object
- Object
- S3arch::Tokenizer
- Defined in:
- lib/s3arch/tokenizer.rb
Overview
Pure tokenization — generates the token string stored in DynamoDB. At query time, FTS5 tokenizes the same way internally, so prefix matches work.
Instance Method Summary collapse
-
#initialize(fields: S3arch.configuration.searchable_fields) ⇒ Tokenizer
constructor
A new instance of Tokenizer.
-
#tokenize(record) ⇒ Object
Accepts a record hash, returns a hash of { field => tokenized_string } This is what gets stored in DynamoDB and fed directly into FTS5.
-
#tokenize_flat(record) ⇒ Object
Flattened single-string version for simple storage (all fields concatenated).
Constructor Details
#initialize(fields: S3arch.configuration.searchable_fields) ⇒ Tokenizer
Returns a new instance of Tokenizer.
7 8 9 |
# File 'lib/s3arch/tokenizer.rb', line 7 def initialize(fields: S3arch.configuration.searchable_fields) @fields = fields end |
Instance Method Details
#tokenize(record) ⇒ Object
Accepts a record hash, returns a hash of { field => tokenized_string } This is what gets stored in DynamoDB and fed directly into FTS5.
13 14 15 16 17 |
# File 'lib/s3arch/tokenizer.rb', line 13 def tokenize(record) @fields.to_h do |field| [field, normalize(record[field])] end end |
#tokenize_flat(record) ⇒ Object
Flattened single-string version for simple storage (all fields concatenated)
20 21 22 |
# File 'lib/s3arch/tokenizer.rb', line 20 def tokenize_flat(record) @fields.map { |f| normalize(record[f]) }.reject(&:empty?).join(' ') end |