Module: AllStak::Sanitizer

Defined in:
lib/allstak/sanitizer.rb

Constant Summary collapse

REDACTED =
"[REDACTED]"
DEFAULT_DENYLIST =
%w[
  authorization
  proxy-authorization
  cookie
  set-cookie
  password
  passwd
  pwd
  api_key
  apikey
  x-api-key
  x-allstak-key
  x-auth-token
  x-access-token
  token
  bearer
  jwt
  session
  sessionid
  session_id
  secret
  credit_card
  card_number
  cvv
  ssn
  csrf
].freeze
ALLOWLIST =

Exact, CASE-SENSITIVE keys that look sensitive by substring but are NOT —they are first-class SDK telemetry fields that must survive scrubbing. The release-health ‘sessionId` (camelCase) carries the SDK’s own per-process session id (a random UUID, not a user/auth session token); the backend error consumer needs it to attribute crashes, so it must never be redacted. Matched exactly and case-sensitively, so genuine cookie/auth keys like ‘session`, `session_id`, or `sessionid` (the lower-case denylist terms) are still scrubbed.

%w[
  sessionId
].freeze
MAX_SCAN_LENGTH =

Longest single string we will scan for value patterns. Larger strings are passed through untouched so a pathological multi-MB blob never stalls the wire path. Key-name redaction still applies to its containing key.

16_384
VALUE_SCRUB_SKIP_KEYS =

Keys whose scalar string value is exempt from value-pattern scrubbing (matched case-sensitively against the original key, then case-insensitively as a fallback). These carry structured identifiers / locations that the patterns would otherwise corrupt: stack-frame fields, release/sdk/build metadata, span & trace ids, URLs/paths (their own URL redactor owns them).

%w[
  filename
  function
  abspath
  abs_path
  lineno
  colno
  release
  version
  dist
  platform
  environment
  sdkname
  sdk_name
  sdkversion
  sdk_version
  sdk.name
  sdk.version
  commit.sha
  commit.branch
  commit_sha
  url
  path
  host
  hostname
  route
  operation
  op
  spanid
  span_id
  parentspanid
  parent_span_id
  traceid
  trace_id
  requestid
  request_id
  sessionid
  sessionId
  timestamp
].each_with_object({}) { |k, h| h[k.downcase] = true }.freeze
VALUE_SCRUB_SKIP_SUBTREES =

Top-level subtrees that are never value-scrubbed. ‘user` holds data the caller explicitly set via setUser (intentional identification — ships as before). `frames`/`stackTrace` hold structured stack frames whose filenames/functions must not be corrupted.

%w[
  user
  frames
  stackTrace
  stacktrace
].each_with_object({}) { |k, h| h[k.downcase] = true }.freeze
SSN_REGEX =

US SSN — REQUIRE the hyphens so bare 9-digit numbers (order ids, etc.) are not nuked. Compiled once.

/\b\d{3}-\d{2}-\d{4}\b/.freeze
CC_CANDIDATE_REGEX =

Candidate credit-card runs: 13–19 digits with optional single space/hyphen separators between groups. Luhn-validated before redaction (see #luhn?), so digit runs that fail the checksum (timestamps, order ids) survive. Word-boundary-ish anchors keep us from matching the middle of a longer digit string.

/(?<![\d-])(?:\d[ -]?){12,18}\d(?![\d-])/.freeze
EMAIL_REGEX =

Standard email address. Compiled once.

/\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b/.freeze
IPV4_OCTET =

IPv4 with each octet validated to 0–255. Compiled once.

'(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)'
IPV4_REGEX =
/\b#{IPV4_OCTET}\.#{IPV4_OCTET}\.#{IPV4_OCTET}\.#{IPV4_OCTET}\b/.freeze
IPV6_REGEX =
IPv6 best-effort: 2+ groups of hex separated by colons, with optional

compression. Intentionally loose — IPv6 detection is best-effort per spec.

/\b(?:[0-9A-Fa-f]{1,4}:){2,7}[0-9A-Fa-f]{0,4}\b|\b::(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4}\b/.freeze

Class Method Summary collapse

Class Method Details

.luhn?(digits) ⇒ Boolean

Luhn (mod-10) checksum over a string of digits.

Returns:

  • (Boolean)


335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
# File 'lib/allstak/sanitizer.rb', line 335

def luhn?(digits)
  return false unless digits =~ /\A\d{13,19}\z/

  sum = 0
  double = false
  digits.reverse.each_char do |ch|
    d = ch.to_i
    if double
      d *= 2
      d -= 9 if d > 9
    end
    sum += d
    double = !double
  end
  (sum % 10).zero?
end

.record_redactionObject



186
187
188
189
190
# File 'lib/allstak/sanitizer.rb', line 186

def record_redaction
  @redaction_mutex.synchronize { @redaction_count += 1 }
rescue StandardError
  nil
end

.redaction_countObject



173
174
175
176
177
# File 'lib/allstak/sanitizer.rb', line 173

def redaction_count
  @redaction_mutex.synchronize { @redaction_count }
rescue StandardError
  0
end

.reset_redaction_count!Object



179
180
181
182
183
184
# File 'lib/allstak/sanitizer.rb', line 179

def reset_redaction_count!
  @redaction_mutex.synchronize { @redaction_count = 0 }
  nil
rescue StandardError
  nil
end

.scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) ⇒ Object

Returns a sanitized deep copy of ‘payload`.

Parameters:

  • extra_denylist (Array<String>, nil) (defaults to: nil)

    additional key terms to redact; may extend but not narrow the canonical list.

  • send_default_pii (Boolean) (defaults to: false)

    when true, the tier-B value scrubbers (email, IPv4/IPv6) are disabled — the caller has opted into PII. Tier-A (credit card, SSN) is ALWAYS applied. Default false (privacy-safe).

  • values (Boolean) (defaults to: true)

    when false, only key-name redaction runs (no value-pattern scrubbing). Useful for an intermediate pre-scrub (e.g. Sidekiq job args) where the wire-path scrub will value-scrub later with the authoritative config. Default true.



203
204
205
206
207
208
209
# File 'lib/allstak/sanitizer.rb', line 203

def scrub(payload, extra_denylist: nil, send_default_pii: false, values: true)
  denylist = DEFAULT_DENYLIST.dup
  denylist.concat(extra_denylist.map { |t| t.to_s.downcase }) if extra_denylist
  denylist.uniq!
  return walk_keys_only(payload, denylist, Set.new) unless values
  walk(payload, denylist, Set.new, send_default_pii)
end

.scrub_credit_cards(str) ⇒ Object

Replace only those candidate credit-card runs that pass the Luhn checksum. A run that fails Luhn (e.g. an order id or timestamp that happens to be 13–19 digits) is left intact, minimizing over-redaction.



323
324
325
326
327
328
329
330
331
332
# File 'lib/allstak/sanitizer.rb', line 323

def scrub_credit_cards(str)
  str.gsub(CC_CANDIDATE_REGEX) do |match|
    digits = match.gsub(/[ -]/, "")
    if digits.length.between?(13, 19) && luhn?(digits)
      REDACTED
    else
      match
    end
  end
end

.scrub_value(str, send_default_pii) ⇒ Object

Apply value-pattern scrubbing to a single string. Fail-open: any error returns the original string. Oversized strings are passed through.



297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# File 'lib/allstak/sanitizer.rb', line 297

def scrub_value(str, send_default_pii)
  return str unless str.is_a?(String)
  return str if str.empty? || str.length > MAX_SCAN_LENGTH

  out = str

  # Tier A — ALWAYS (regardless of send_default_pii).
  out = out.gsub(SSN_REGEX, REDACTED)
  out = scrub_credit_cards(out)

  # Tier B — only when the caller has NOT opted into PII.
  unless send_default_pii
    out = out.gsub(EMAIL_REGEX, REDACTED)
    out = out.gsub(IPV4_REGEX, REDACTED)
    out = out.gsub(IPV6_REGEX, REDACTED)
  end

  record_redaction if out != str
  out
rescue StandardError
  str
end

.sensitive?(key, denylist) ⇒ Boolean

Returns:

  • (Boolean)


211
212
213
214
215
216
217
218
219
220
221
222
# File 'lib/allstak/sanitizer.rb', line 211

def sensitive?(key, denylist)
  return false unless key.is_a?(String) || key.is_a?(Symbol)

  # Exact, case-sensitive allowlist wins: a first-class SDK field (e.g.
  # release-health `sessionId`) is never scrubbed even though its lowercase
  # form contains a denied substring. Checked against the ORIGINAL key so
  # `sessionId` survives while `sessionid`/`session_id`/`session` are scrubbed.
  return false if ALLOWLIST.include?(key.to_s)

  k = key.to_s.downcase
  denylist.any? { |term| k.include?(term) }
end

.skip_subtree?(key) ⇒ Boolean

Returns:

  • (Boolean)


285
286
287
288
# File 'lib/allstak/sanitizer.rb', line 285

def skip_subtree?(key)
  return false unless key.is_a?(String) || key.is_a?(Symbol)
  VALUE_SCRUB_SKIP_SUBTREES.key?(key.to_s.downcase)
end

.skip_value_scrub_key?(key) ⇒ Boolean

Returns:

  • (Boolean)


290
291
292
293
# File 'lib/allstak/sanitizer.rb', line 290

def skip_value_scrub_key?(key)
  return false unless key.is_a?(String) || key.is_a?(Symbol)
  VALUE_SCRUB_SKIP_KEYS.key?(key.to_s.downcase)
end

.walk(value, denylist, seen, send_default_pii) ⇒ Object



224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# File 'lib/allstak/sanitizer.rb', line 224

def walk(value, denylist, seen, send_default_pii)
  case value
  when Hash
    return REDACTED if seen.include?(value.object_id)

    seen.add(value.object_id)
    value.each_with_object({}) do |(k, v), out|
      out[k] =
        if sensitive?(k, denylist)
          record_redaction
          REDACTED
        elsif skip_subtree?(k)
          # Explicit user object / stack frames: deep-copy with key-name
          # redaction still applied, but NO value-pattern scrubbing.
          walk_keys_only(v, denylist, seen)
        elsif skip_value_scrub_key?(k)
          # Structured scalar (release, url, span id, …): recurse for nested
          # collections, but do not value-scrub a scalar string here.
          v.is_a?(Hash) || v.is_a?(Array) ? walk(v, denylist, seen, send_default_pii) : v
        else
          walk(v, denylist, seen, send_default_pii)
        end
    end
  when Array
    return REDACTED if seen.include?(value.object_id)

    seen.add(value.object_id)
    value.map { |v| walk(v, denylist, seen, send_default_pii) }
  when String
    scrub_value(value, send_default_pii)
  else
    value
  end
end

.walk_keys_only(value, denylist, seen) ⇒ Object

Recurse applying ONLY key-name redaction (no value-pattern scrubbing). Used for exempt subtrees (explicit user object, stack frames).



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# File 'lib/allstak/sanitizer.rb', line 261

def walk_keys_only(value, denylist, seen)
  case value
  when Hash
    return REDACTED if seen.include?(value.object_id)

    seen.add(value.object_id)
    value.each_with_object({}) do |(k, v), out|
      if sensitive?(k, denylist)
        record_redaction
        out[k] = REDACTED
      else
        out[k] = walk_keys_only(v, denylist, seen)
      end
    end
  when Array
    return REDACTED if seen.include?(value.object_id)

    seen.add(value.object_id)
    value.map { |v| walk_keys_only(v, denylist, seen) }
  else
    value
  end
end