Module: Philiprehberger::Mask

Defined in:
lib/philiprehberger/mask.rb,
lib/philiprehberger/mask/version.rb,
lib/philiprehberger/mask/detector.rb,
lib/philiprehberger/mask/scrubber.rb,
lib/philiprehberger/mask/configuration.rb,
lib/philiprehberger/mask/deep_scrubber.rb

Defined Under Namespace

Modules: DeepScrubber, Detector, Scrubber Classes: Configuration, Error

Constant Summary collapse

VERSION =
'0.6.0'

Class Method Summary collapse

Class Method Details

.add_locale(locale, patterns) ⇒ Object

Register locale-specific patterns

Parameters:

  • locale (Symbol)

    locale identifier

  • patterns (Hash<Symbol, Regexp>)

    detector name to regex mapping



129
130
131
# File 'lib/philiprehberger/mask.rb', line 129

def self.add_locale(locale, patterns)
  Configuration.instance.add_locale(locale, patterns)
end

.batch_scrub(strings, mode: :full, locale: nil) ⇒ Array<String>

Process an array of strings in one call with shared compiled patterns

Raises ArgumentError when strings is not an Array. An empty Array returns [].

Examples:

Scrub several strings at once

Philiprehberger::Mask.batch_scrub(['user@example.com', 'SSN: 123-45-6789'])
# => ["u***@e******.com", "SSN: ***-**-6789"]

Parameters:

  • strings (Array<String>)

    input strings

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Array<String>)

    scrubbed strings (empty Array for empty input)

Raises:

  • (ArgumentError)

    if strings is not an Array



110
111
112
113
114
115
116
# File 'lib/philiprehberger/mask.rb', line 110

def self.batch_scrub(strings, mode: :full, locale: nil)
  raise ArgumentError, 'strings must be an Array' unless strings.is_a?(Array)

  patterns = Configuration.instance.patterns(locale: locale)
  compiled = patterns.map { |pat| pat.merge(pattern: Regexp.new(pat[:pattern].source, pat[:pattern].options)) }
  strings.map { |s| Scrubber.call(s, patterns: compiled, mode: mode) }
end

.configure {|Configuration| ... } ⇒ Object

Configure custom patterns

Yields:



195
196
197
# File 'lib/philiprehberger/mask.rb', line 195

def self.configure(&block)
  block.call(Configuration.instance)
end

.configure_priority(detector_order) ⇒ Object

Set detector evaluation priority

Parameters:

  • detector_order (Array<Symbol>)

    detector names in desired order



121
122
123
# File 'lib/philiprehberger/mask.rb', line 121

def self.configure_priority(detector_order)
  Configuration.instance.set_priority(detector_order)
end

.detect(string, locale: nil) ⇒ Array<Hash>

Scan a string for PII without modifying it

Returns the list of detector matches in detection order. Each entry has :detector, :match, and :position. Useful for “should this be redacted?” checks before the cost of substitution. The input string is not mutated.

Examples:

Detect PII without redacting

Philiprehberger::Mask.detect('Email user@example.com or call 555-123-4567')
# => [{ detector: :email, match: "user@example.com", position: 6 },
#     { detector: :phone, match: "555-123-4567", position: 31 }]

Parameters:

  • string (String)

    the input string

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Array<Hash>)
    { detector:, match:, position: }, …

    (empty when no PII)



40
41
42
# File 'lib/philiprehberger/mask.rb', line 40

def self.detect(string, locale: nil)
  Scrubber.scan(string, patterns: Configuration.instance.patterns(locale: locale))
end

.detokenize(string, tokens:) ⇒ String

Reverse tokenization using a token lookup table

Parameters:

  • string (String)

    the tokenized string

  • tokens (Hash)

    token-to-original mapping

Returns:

  • (String)

    the restored string



92
93
94
95
96
# File 'lib/philiprehberger/mask.rb', line 92

def self.detokenize(string, tokens:)
  result = string.dup
  tokens.each { |token, original| result = result.gsub(token, original) }
  result
end

.reset_configuration!Object

Reset configuration to defaults



200
201
202
# File 'lib/philiprehberger/mask.rb', line 200

def self.reset_configuration!
  Configuration.reset!
end

.scrub(string, mode: :full) ⇒ String

Detect and redact PII patterns in a string

Examples:

Mask an email in a string

Philiprehberger::Mask.scrub('Contact user@example.com')
# => "Contact u***@e******.com"

Parameters:

  • string (String)

    the input string

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

Returns:

  • (String)

    the scrubbed string



22
23
24
# File 'lib/philiprehberger/mask.rb', line 22

def self.scrub(string, mode: :full)
  Scrubber.call(string, patterns: Configuration.instance.patterns, mode: mode)
end

.scrub_hash(data, keys: nil, mode: :full) ⇒ Hash, Array

Deep-walk a hash/array and redact sensitive values

Examples:

Redact sensitive keys and PII inside nested structures

Philiprehberger::Mask.scrub_hash(user: { email: 'a@b.com', password: 'secret' })
# => { user: { email: "a***@b.com", password: "[FILTERED]" } }

Parameters:

  • data (Hash, Array)

    the input structure

  • keys (Array<Symbol, String>, nil) (defaults to: nil)

    specific keys to scrub

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

Returns:

  • (Hash, Array)

    the scrubbed structure



53
54
55
56
# File 'lib/philiprehberger/mask.rb', line 53

def self.scrub_hash(data, keys: nil, mode: :full)
  config = Configuration.instance
  DeepScrubber.call(data, patterns: config.patterns, sensitive_keys: keys || config.sensitive_keys, mode: mode)
end

.scrub_hash_with_audit(data, keys: nil) ⇒ Hash

Deep-walk a hash/array and redact sensitive values with audit trail

Parameters:

  • data (Hash, Array)

    the input structure

  • keys (Array<Symbol, String>, nil) (defaults to: nil)

    specific keys to scrub

Returns:

  • (Hash)

    { result:, audit: […] }



63
64
65
66
# File 'lib/philiprehberger/mask.rb', line 63

def self.scrub_hash_with_audit(data, keys: nil)
  config = Configuration.instance
  DeepScrubber.call_with_audit(data, patterns: config.patterns, sensitive_keys: keys || config.sensitive_keys)
end

.scrub_io(io, mode: :full, locale: nil) ⇒ Array<String>

Read from IO line by line, scrub each line

Raises ArgumentError when io is nil. An IO that is already at EOF (or empty) returns an empty Array rather than raising.

Examples:

Scrub lines from an in-memory IO

Philiprehberger::Mask.scrub_io(StringIO.new("user@example.com\n"))
# => ["u***@e******.com\n"]

Parameters:

  • io (IO, StringIO)

    readable IO object

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Array<String>)

    scrubbed lines (empty Array when the IO is at EOF)

Raises:

  • (ArgumentError)

    if io is nil



146
147
148
149
150
151
152
# File 'lib/philiprehberger/mask.rb', line 146

def self.scrub_io(io, mode: :full, locale: nil)
  raise ArgumentError, 'io is required' if io.nil?
  return [] if io.respond_to?(:eof?) && io.eof?

  patterns = Configuration.instance.patterns(locale: locale)
  io.each_line.map { |line| Scrubber.call(line, patterns: patterns, mode: mode) }
end

.scrub_log(path, output: nil, mode: :full, locale: nil) ⇒ Hash

Read a file line by line, scrub each line, and write the result

Parameters:

  • path (String)

    path to the file to scrub

  • output (String, nil) (defaults to: nil)

    destination path; overwrites in-place if nil

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Hash)

    { lines_processed:, lines_modified:, detections: }



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/philiprehberger/mask.rb', line 161

def self.scrub_log(path, output: nil, mode: :full, locale: nil)
  patterns = Configuration.instance.patterns(locale: locale)
  lines_processed = 0
  lines_modified = 0
  detections = 0

  scrubbed_lines = File.open(path, 'r') do |f|
    f.each_line.map do |line|
      lines_processed += 1
      scrubbed = Scrubber.call(line, patterns: patterns, mode: mode)
      if scrubbed != line
        lines_modified += 1
        detections += Scrubber.call_with_audit(line, patterns: patterns)[:audit].length
      end
      scrubbed
    end
  end

  if output.nil?
    Tempfile.open([File.basename(path), '.tmp'], File.dirname(path)) do |tmp|
      tmp.write(scrubbed_lines.join)
      tmp.flush
      File.rename(tmp.path, path)
    end
  else
    File.write(output, scrubbed_lines.join)
  end

  { lines_processed: lines_processed, lines_modified: lines_modified, detections: detections }
end

.scrub_with_audit(string) ⇒ Hash

Scrub a string and return an audit trail of what was masked

Parameters:

  • string (String)

    the input string

Returns:

  • (Hash)

    { result:, audit: [original:, masked:, position:] }



72
73
74
# File 'lib/philiprehberger/mask.rb', line 72

def self.scrub_with_audit(string)
  Scrubber.call_with_audit(string, patterns: Configuration.instance.patterns)
end

.tokenize(string) ⇒ Hash

Replace PII with reversible tokens

Examples:

Replace PII with reversible tokens

result = Philiprehberger::Mask.tokenize('Contact user@example.com')
# => { masked: "Contact <TOKEN_EMAIL_1>", tokens: { "<TOKEN_EMAIL_1>" => "user@example.com" } }

Parameters:

  • string (String)

    the input string

Returns:

  • (Hash)

    { masked:, tokens: {} }



83
84
85
# File 'lib/philiprehberger/mask.rb', line 83

def self.tokenize(string)
  Scrubber.call_with_tokens(string, patterns: Configuration.instance.patterns)
end