Module: Philiprehberger::Mask

Defined in:
lib/philiprehberger/mask.rb,
lib/philiprehberger/mask/version.rb,
lib/philiprehberger/mask/detector.rb,
lib/philiprehberger/mask/scrubber.rb,
lib/philiprehberger/mask/configuration.rb,
lib/philiprehberger/mask/deep_scrubber.rb

Defined Under Namespace

Modules: DeepScrubber, Detector, Scrubber Classes: Configuration, Error

Constant Summary collapse

VERSION =
'0.5.1'

Class Method Summary collapse

Class Method Details

.add_locale(locale, patterns) ⇒ Object

Register locale-specific patterns

Parameters:

  • locale (Symbol)

    locale identifier

  • patterns (Hash<Symbol, Regexp>)

    detector name to regex mapping



111
112
113
# File 'lib/philiprehberger/mask.rb', line 111

def self.add_locale(locale, patterns)
  Configuration.instance.add_locale(locale, patterns)
end

.batch_scrub(strings, mode: :full, locale: nil) ⇒ Array<String>

Process an array of strings in one call with shared compiled patterns

Raises ArgumentError when strings is not an Array. An empty Array returns [].

Examples:

Scrub several strings at once

Philiprehberger::Mask.batch_scrub(['user@example.com', 'SSN: 123-45-6789'])
# => ["u***@e******.com", "SSN: ***-**-6789"]

Parameters:

  • strings (Array<String>)

    input strings

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Array<String>)

    scrubbed strings (empty Array for empty input)

Raises:

  • (ArgumentError)

    if strings is not an Array



92
93
94
95
96
97
98
# File 'lib/philiprehberger/mask.rb', line 92

def self.batch_scrub(strings, mode: :full, locale: nil)
  raise ArgumentError, 'strings must be an Array' unless strings.is_a?(Array)

  patterns = Configuration.instance.patterns(locale: locale)
  compiled = patterns.map { |pat| pat.merge(pattern: Regexp.new(pat[:pattern].source, pat[:pattern].options)) }
  strings.map { |s| Scrubber.call(s, patterns: compiled, mode: mode) }
end

.configure {|Configuration| ... } ⇒ Object

Configure custom patterns

Yields:



177
178
179
# File 'lib/philiprehberger/mask.rb', line 177

def self.configure(&block)
  block.call(Configuration.instance)
end

.configure_priority(detector_order) ⇒ Object

Set detector evaluation priority

Parameters:

  • detector_order (Array<Symbol>)

    detector names in desired order



103
104
105
# File 'lib/philiprehberger/mask.rb', line 103

def self.configure_priority(detector_order)
  Configuration.instance.set_priority(detector_order)
end

.detokenize(string, tokens:) ⇒ String

Reverse tokenization using a token lookup table

Parameters:

  • string (String)

    the tokenized string

  • tokens (Hash)

    token-to-original mapping

Returns:

  • (String)

    the restored string



74
75
76
77
78
# File 'lib/philiprehberger/mask.rb', line 74

def self.detokenize(string, tokens:)
  result = string.dup
  tokens.each { |token, original| result = result.gsub(token, original) }
  result
end

.reset_configuration!Object

Reset configuration to defaults



182
183
184
# File 'lib/philiprehberger/mask.rb', line 182

def self.reset_configuration!
  Configuration.reset!
end

.scrub(string, mode: :full) ⇒ String

Detect and redact PII patterns in a string

Examples:

Mask an email in a string

Philiprehberger::Mask.scrub('Contact user@example.com')
# => "Contact u***@e******.com"

Parameters:

  • string (String)

    the input string

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

Returns:

  • (String)

    the scrubbed string



22
23
24
# File 'lib/philiprehberger/mask.rb', line 22

def self.scrub(string, mode: :full)
  Scrubber.call(string, patterns: Configuration.instance.patterns, mode: mode)
end

.scrub_hash(data, keys: nil, mode: :full) ⇒ Hash, Array

Deep-walk a hash/array and redact sensitive values

Examples:

Redact sensitive keys and PII inside nested structures

Philiprehberger::Mask.scrub_hash(user: { email: 'a@b.com', password: 'secret' })
# => { user: { email: "a***@b.com", password: "[FILTERED]" } }

Parameters:

  • data (Hash, Array)

    the input structure

  • keys (Array<Symbol, String>, nil) (defaults to: nil)

    specific keys to scrub

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

Returns:

  • (Hash, Array)

    the scrubbed structure



35
36
37
38
# File 'lib/philiprehberger/mask.rb', line 35

def self.scrub_hash(data, keys: nil, mode: :full)
  config = Configuration.instance
  DeepScrubber.call(data, patterns: config.patterns, sensitive_keys: keys || config.sensitive_keys, mode: mode)
end

.scrub_hash_with_audit(data, keys: nil) ⇒ Hash

Deep-walk a hash/array and redact sensitive values with audit trail

Parameters:

  • data (Hash, Array)

    the input structure

  • keys (Array<Symbol, String>, nil) (defaults to: nil)

    specific keys to scrub

Returns:

  • (Hash)

    { result:, audit: […] }



45
46
47
48
# File 'lib/philiprehberger/mask.rb', line 45

def self.scrub_hash_with_audit(data, keys: nil)
  config = Configuration.instance
  DeepScrubber.call_with_audit(data, patterns: config.patterns, sensitive_keys: keys || config.sensitive_keys)
end

.scrub_io(io, mode: :full, locale: nil) ⇒ Array<String>

Read from IO line by line, scrub each line

Raises ArgumentError when io is nil. An IO that is already at EOF (or empty) returns an empty Array rather than raising.

Examples:

Scrub lines from an in-memory IO

Philiprehberger::Mask.scrub_io(StringIO.new("user@example.com\n"))
# => ["u***@e******.com\n"]

Parameters:

  • io (IO, StringIO)

    readable IO object

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Array<String>)

    scrubbed lines (empty Array when the IO is at EOF)

Raises:

  • (ArgumentError)

    if io is nil



128
129
130
131
132
133
134
# File 'lib/philiprehberger/mask.rb', line 128

def self.scrub_io(io, mode: :full, locale: nil)
  raise ArgumentError, 'io is required' if io.nil?
  return [] if io.respond_to?(:eof?) && io.eof?

  patterns = Configuration.instance.patterns(locale: locale)
  io.each_line.map { |line| Scrubber.call(line, patterns: patterns, mode: mode) }
end

.scrub_log(path, output: nil, mode: :full, locale: nil) ⇒ Hash

Read a file line by line, scrub each line, and write the result

Parameters:

  • path (String)

    path to the file to scrub

  • output (String, nil) (defaults to: nil)

    destination path; overwrites in-place if nil

  • mode (Symbol) (defaults to: :full)

    masking mode (:full, :partial, :format_preserving)

  • locale (Symbol, nil) (defaults to: nil)

    optional locale for locale-specific patterns

Returns:

  • (Hash)

    { lines_processed:, lines_modified:, detections: }



143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/philiprehberger/mask.rb', line 143

def self.scrub_log(path, output: nil, mode: :full, locale: nil)
  patterns = Configuration.instance.patterns(locale: locale)
  lines_processed = 0
  lines_modified = 0
  detections = 0

  scrubbed_lines = File.open(path, 'r') do |f|
    f.each_line.map do |line|
      lines_processed += 1
      scrubbed = Scrubber.call(line, patterns: patterns, mode: mode)
      if scrubbed != line
        lines_modified += 1
        detections += Scrubber.call_with_audit(line, patterns: patterns)[:audit].length
      end
      scrubbed
    end
  end

  if output.nil?
    Tempfile.open([File.basename(path), '.tmp'], File.dirname(path)) do |tmp|
      tmp.write(scrubbed_lines.join)
      tmp.flush
      File.rename(tmp.path, path)
    end
  else
    File.write(output, scrubbed_lines.join)
  end

  { lines_processed: lines_processed, lines_modified: lines_modified, detections: detections }
end

.scrub_with_audit(string) ⇒ Hash

Scrub a string and return an audit trail of what was masked

Parameters:

  • string (String)

    the input string

Returns:

  • (Hash)

    { result:, audit: [original:, masked:, position:] }



54
55
56
# File 'lib/philiprehberger/mask.rb', line 54

def self.scrub_with_audit(string)
  Scrubber.call_with_audit(string, patterns: Configuration.instance.patterns)
end

.tokenize(string) ⇒ Hash

Replace PII with reversible tokens

Examples:

Replace PII with reversible tokens

result = Philiprehberger::Mask.tokenize('Contact user@example.com')
# => { masked: "Contact <TOKEN_EMAIL_1>", tokens: { "<TOKEN_EMAIL_1>" => "user@example.com" } }

Parameters:

  • string (String)

    the input string

Returns:

  • (Hash)

    { masked:, tokens: {} }



65
66
67
# File 'lib/philiprehberger/mask.rb', line 65

def self.tokenize(string)
  Scrubber.call_with_tokens(string, patterns: Configuration.instance.patterns)
end