Class: CipherStash::Analysis::TextProcessor

Inherits:
Object
  • Object
show all
Defined in:
lib/cipherstash/analysis/text_processor.rb

Overview

General (but very simple) string processor based on settings from a CipherStash collection schema

Instance Method Summary collapse

Constructor Details

#initialize(settings) ⇒ TextProcessor

Creates a new string processor for the given field settings

Example

Processor.new({ "tokenFilters"=>[ "kind"=>"downcase", "minLength"=>3, "maxLength"=>8 ], "tokenizer"=>"kind"=>"standard" })

Parameters:

  • settings (Hash)

    the field settings



24
25
26
27
# File 'lib/cipherstash/analysis/text_processor.rb', line 24

def initialize(settings)
  @token_filters = build_token_filters(settings["tokenFilters"])
  @tokenizer = build_tokenizer(settings["tokenizer"])
end

Instance Method Details

#perform(str) ⇒ String

Processes the given str and returns an array of tokens (the "Vector")

Parameters:

  • str (String)

    the string to process

Returns:

  • (String)


34
35
36
37
38
39
# File 'lib/cipherstash/analysis/text_processor.rb', line 34

def perform(str)
  tokens = @tokenizer.perform(str)
  @token_filters.inject(tokens) do |result, stage|
    stage.perform(result)
  end
end