Class: S3arch::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/s3arch/tokenizer.rb

Overview

Pure tokenization — generates the token string stored in DynamoDB. At query time, FTS5 tokenizes the same way internally, so prefix matches work.

Instance Method Summary collapse

Constructor Details

#initialize(fields: S3arch.configuration.searchable_fields) ⇒ Tokenizer

Returns a new instance of Tokenizer.



7
8
9
# File 'lib/s3arch/tokenizer.rb', line 7

def initialize(fields: S3arch.configuration.searchable_fields)
  @fields = fields
end

Instance Method Details

#tokenize(record) ⇒ Object

Accepts a record hash, returns a hash of { field => tokenized_string } This is what gets stored in DynamoDB and fed directly into FTS5.



13
14
15
16
17
# File 'lib/s3arch/tokenizer.rb', line 13

def tokenize(record)
  @fields.to_h do |field|
    [field, normalize(record[field])]
  end
end

#tokenize_flat(record) ⇒ Object

Flattened single-string version for simple storage (all fields concatenated)



20
21
22
# File 'lib/s3arch/tokenizer.rb', line 20

def tokenize_flat(record)
  @fields.map { |f| normalize(record[f]) }.reject(&:empty?).join(' ')
end