Class: TopSecret::Text

Inherits:
Object
  • Object
show all
Defined in:
lib/top_secret/text.rb,
lib/top_secret/text/result.rb,
lib/top_secret/text/scan_result.rb,
lib/top_secret/text/batch_result.rb,
lib/top_secret/text/global_mapping.rb,
lib/top_secret/text/label_sequence.rb

Overview

Processes text to identify and redact sensitive information using configured filters.

Defined Under Namespace

Classes: BatchResult, GlobalMapping, LabelSequence, Result, ScanResult

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input, custom_filters: [], filters: {}, model: nil) ⇒ Text

Returns a new instance of Text.

Parameters:

  • input (String)

    The original text to be filtered

  • filters (Hash, nil) (defaults to: {})

    Optional set of filters to override the defaults

  • custom_filters (Array) (defaults to: [])

    Additional custom filters to apply

  • model (Mitie::NER, nil) (defaults to: nil)

    Optional pre-loaded MITIE model for performance



49
50
51
52
53
54
55
56
57
58
# File 'lib/top_secret/text.rb', line 49

def initialize(input, custom_filters: [], filters: {}, model: nil)
  @input = input
  @output = input.dup
  @mapping = {}

  @model = model || default_model

  @filters = filters
  @custom_filters = custom_filters
end

Class Method Details

.clear_model_cache!void

This method returns an undefined value.

Clears the cached model, forcing reinitialization on next access



38
39
40
41
42
# File 'lib/top_secret/text.rb', line 38

def clear_model_cache!
  @mutex.synchronize do
    @shared_model = nil
  end
end

.filter(input, custom_filters: [], **filters) ⇒ Result

Convenience method to create an instance and filter input

Parameters:

  • input (String)

    The text to filter

  • filters (Hash)

    Optional filters to override defaults (only valid filter keys accepted)

  • custom_filters (Array) (defaults to: [])

    Additional custom filters to apply

Returns:

  • (Result)

    The filtered result

Raises:

  • (ArgumentError)

    If invalid filter keys are provided



67
68
69
# File 'lib/top_secret/text.rb', line 67

def self.filter(input, custom_filters: [], **filters)
  new(input, filters:, custom_filters:).filter
end

.filter_all(messages, custom_filters: [], **filters) ⇒ BatchResult

Filters multiple messages with globally consistent redaction labels

Processes a collection of messages and ensures that identical sensitive values receive the same redaction labels across all messages. This is useful when processing conversation threads or document collections where consistency matters.

Examples:

Basic usage

messages = ["Contact john@test.com", "Email john@test.com again"]
result = TopSecret::Text.filter_all(messages)
result.items[0].output # => "Contact [EMAIL_1]"
result.items[1].output # => "Email [EMAIL_1] again"
result.items[0].mapping # => { EMAIL_1: "john@test.com" }
result.mapping # => { EMAIL_1: "john@test.com" }

With custom filters

ip_filter = TopSecret::Filters::Regex.new(label: "IP", regex: /\d+\.\d+\.\d+\.\d+/)
result = TopSecret::Text.filter_all(messages, custom_filters: [ip_filter])

Parameters:

  • messages (Array<String>)

    Array of text messages to filter

  • custom_filters (Array) (defaults to: [])

    Additional custom filters to apply

  • filters (Hash)

    Optional filters to override defaults (only valid filter keys accepted)

Returns:

  • (BatchResult)

    Contains global mapping and array of Result objects with individual mappings

Raises:

  • (ArgumentError)

    If invalid filter keys are provided



94
95
96
# File 'lib/top_secret/text.rb', line 94

def self.filter_all(messages, custom_filters: [], **filters)
  Text::BatchResult.from_messages(messages, custom_filters:, **filters)
end

.scan(input, custom_filters: [], **filters) ⇒ ScanResult

Convenience method to scan input text for sensitive information without redacting it

This method detects sensitive information using configured filters but does not modify the original text. Use this when you only need to check if sensitive data exists or get a mapping of what was found.

Examples:

Basic scanning

result = TopSecret::Text.scan("Contact john@example.com")
result.sensitive? # => true
result.mapping    # => {:EMAIL_1=>"john@example.com"}

With custom filters

ip_filter = TopSecret::Filters::Regex.new(label: "IP", regex: /\d+\.\d+\.\d+\.\d+/)
result = TopSecret::Text.scan("Server IP: 192.168.1.1", custom_filters: [ip_filter])
result.mapping # => {:IP_1=>"192.168.1.1"}

Overriding default filters

custom_email = TopSecret::Filters::Regex.new(label: "EMAIL_ADDR", regex: /\w+@\w+/)
result = TopSecret::Text.scan("user@test.com", email_filter: custom_email)
result.mapping # => {:EMAIL_ADDR_1=>"user@test.com"}

Parameters:

  • input (String)

    The text to scan for sensitive information

  • filters (Hash)

    Optional filters to override defaults (only valid filter keys accepted)

  • custom_filters (Array) (defaults to: [])

    Additional custom filters to apply

Returns:

  • (ScanResult)

    Contains mapping of found sensitive information and sensitive? flag

Raises:

  • (ArgumentError)

    If invalid filter keys are provided



124
125
126
# File 'lib/top_secret/text.rb', line 124

def self.scan(input, custom_filters: [], **filters)
  new(input, filters:, custom_filters:).scan
end

.shared_modelMitie::NER, NullModel

Returns a cached MITIE model instance to avoid expensive reinitialization

Returns:

  • (Mitie::NER, NullModel)

    The cached model instance



21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/top_secret/text.rb', line 21

def shared_model
  return @shared_model if @shared_model

  @mutex.synchronize do
    return @shared_model if @shared_model

    @shared_model = if TopSecret.model_path
      Mitie::NER.new(TopSecret.model_path)
    else
      NullModel.new
    end
  end
end

Instance Method Details

#filterResult

Applies configured filters to the input, redacting matches and building a mapping.

Returns:

  • (Result)

    Contains original input, redacted output, and mapping of labels to values

Raises:

  • (Error)

    If an unsupported filter is encountered

  • (ArgumentError)

    If invalid filter keys are provided



165
166
167
168
169
170
171
# File 'lib/top_secret/text.rb', line 165

def filter
  scan_result = scan

  substitute_text if scan_result.sensitive?

  Text::Result.new(input, output, scan_result.mapping)
end

#scanScanResult

Scans the input text for sensitive information using configured filters

This method applies all active filters to detect sensitive information but does not redact the original text. It builds a mapping of found values and returns whether any sensitive information was detected.

Returns:

  • (ScanResult)

    Contains mapping of found sensitive information and sensitive? flag

Raises:

  • (Error)

    If an unsupported filter is encountered

  • (ArgumentError)

    If invalid filter keys are provided



137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# File 'lib/top_secret/text.rb', line 137

def scan
  @doc ||= model.doc(@output) if model
  @entities ||= doc.entities if model

  validate_filters!

  all_filters.each do |filter|
    next if filter.nil?

    values = case filter
    when TopSecret::Filters::Regex
      filter.call(input)
    when TopSecret::Filters::NER
      filter.call(entities)
    else
      raise Error, "Unsupported filter. Expected TopSecret::Filters::Regex or TopSecret::Filters::NER, but got #{filter.class}"
    end
    build_mapping(values, label: filter.label)
  end

  ScanResult.new(mapping)
end