Module: AllStak::Sanitizer
- Defined in:
- lib/allstak/sanitizer.rb
Constant Summary collapse
- REDACTED =
"[REDACTED]"- DEFAULT_DENYLIST =
%w[ authorization proxy-authorization cookie set-cookie password passwd pwd api_key apikey x-api-key x-allstak-key x-auth-token x-access-token token bearer jwt session sessionid session_id secret credit_card card_number cvv ssn csrf ].freeze
- ALLOWLIST =
Exact, CASE-SENSITIVE keys that look sensitive by substring but are NOT —they are first-class SDK telemetry fields that must survive scrubbing. The release-health ‘sessionId` (camelCase) carries the SDK’s own per-process session id (a random UUID, not a user/auth session token); the backend error consumer needs it to attribute crashes, so it must never be redacted. Matched exactly and case-sensitively, so genuine cookie/auth keys like ‘session`, `session_id`, or `sessionid` (the lower-case denylist terms) are still scrubbed.
%w[ sessionId ].freeze
- MAX_SCAN_LENGTH =
Longest single string we will scan for value patterns. Larger strings are passed through untouched so a pathological multi-MB blob never stalls the wire path. Key-name redaction still applies to its containing key.
16_384- VALUE_SCRUB_SKIP_KEYS =
Keys whose scalar string value is exempt from value-pattern scrubbing (matched case-sensitively against the original key, then case-insensitively as a fallback). These carry structured identifiers / locations that the patterns would otherwise corrupt: stack-frame fields, release/sdk/build metadata, span & trace ids, URLs/paths (their own URL redactor owns them).
%w[ filename function abspath abs_path lineno colno release version dist platform environment sdkname sdk_name sdkversion sdk_version sdk.name sdk.version commit.sha commit.branch commit_sha url path host hostname route operation op spanid span_id parentspanid parent_span_id traceid trace_id requestid request_id sessionid sessionId timestamp ].each_with_object({}) { |k, h| h[k.downcase] = true }.freeze
- VALUE_SCRUB_SKIP_SUBTREES =
Top-level subtrees that are never value-scrubbed. ‘user` holds data the caller explicitly set via setUser (intentional identification — ships as before). `frames`/`stackTrace` hold structured stack frames whose filenames/functions must not be corrupted.
%w[ user frames stackTrace stacktrace ].each_with_object({}) { |k, h| h[k.downcase] = true }.freeze
- SSN_REGEX =
US SSN — REQUIRE the hyphens so bare 9-digit numbers (order ids, etc.) are not nuked. Compiled once.
/\b\d{3}-\d{2}-\d{4}\b/.freeze
- CC_CANDIDATE_REGEX =
Candidate credit-card runs: 13–19 digits with optional single space/hyphen separators between groups. Luhn-validated before redaction (see #luhn?), so digit runs that fail the checksum (timestamps, order ids) survive. Word-boundary-ish anchors keep us from matching the middle of a longer digit string.
/(?<![\d-])(?:\d[ -]?){12,18}\d(?![\d-])/.freeze
- EMAIL_REGEX =
Standard email address. Compiled once.
/\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b/.freeze
- IPV4_OCTET =
IPv4 with each octet validated to 0–255. Compiled once.
'(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)'- IPV4_REGEX =
/\b#{IPV4_OCTET}\.#{IPV4_OCTET}\.#{IPV4_OCTET}\.#{IPV4_OCTET}\b/.freeze
- IPV6_REGEX =
- IPv6 best-effort: 2+ groups of hex separated by colons, with optional
-
compression. Intentionally loose — IPv6 detection is best-effort per spec.
/\b(?:[0-9A-Fa-f]{1,4}:){2,7}[0-9A-Fa-f]{0,4}\b|\b::(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4}\b/.freeze
Class Method Summary collapse
-
.luhn?(digits) ⇒ Boolean
Luhn (mod-10) checksum over a string of digits.
-
.scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) ⇒ Object
Returns a sanitized deep copy of ‘payload`.
-
.scrub_credit_cards(str) ⇒ Object
Replace only those candidate credit-card runs that pass the Luhn checksum.
-
.scrub_value(str, send_default_pii) ⇒ Object
Apply value-pattern scrubbing to a single string.
- .sensitive?(key, denylist) ⇒ Boolean
- .skip_subtree?(key) ⇒ Boolean
- .skip_value_scrub_key?(key) ⇒ Boolean
- .walk(value, denylist, seen, send_default_pii) ⇒ Object
-
.walk_keys_only(value, denylist, seen) ⇒ Object
Recurse applying ONLY key-name redaction (no value-pattern scrubbing).
Class Method Details
.luhn?(digits) ⇒ Boolean
Luhn (mod-10) checksum over a string of digits.
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
# File 'lib/allstak/sanitizer.rb', line 305 def luhn?(digits) return false unless digits =~ /\A\d{13,19}\z/ sum = 0 double = false digits.reverse.each_char do |ch| d = ch.to_i if double d *= 2 d -= 9 if d > 9 end sum += d double = !double end (sum % 10).zero? end |
.scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) ⇒ Object
Returns a sanitized deep copy of ‘payload`.
180 181 182 183 184 185 186 |
# File 'lib/allstak/sanitizer.rb', line 180 def scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) denylist = DEFAULT_DENYLIST.dup denylist.concat(extra_denylist.map { |t| t.to_s.downcase }) if extra_denylist denylist.uniq! return walk_keys_only(payload, denylist, Set.new) unless values walk(payload, denylist, Set.new, send_default_pii) end |
.scrub_credit_cards(str) ⇒ Object
Replace only those candidate credit-card runs that pass the Luhn checksum. A run that fails Luhn (e.g. an order id or timestamp that happens to be 13–19 digits) is left intact, minimizing over-redaction.
293 294 295 296 297 298 299 300 301 302 |
# File 'lib/allstak/sanitizer.rb', line 293 def scrub_credit_cards(str) str.gsub(CC_CANDIDATE_REGEX) do |match| digits = match.gsub(/[ -]/, "") if digits.length.between?(13, 19) && luhn?(digits) REDACTED else match end end end |
.scrub_value(str, send_default_pii) ⇒ Object
Apply value-pattern scrubbing to a single string. Fail-open: any error returns the original string. Oversized strings are passed through.
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
# File 'lib/allstak/sanitizer.rb', line 268 def scrub_value(str, send_default_pii) return str unless str.is_a?(String) return str if str.empty? || str.length > MAX_SCAN_LENGTH out = str # Tier A — ALWAYS (regardless of send_default_pii). out = out.gsub(SSN_REGEX, REDACTED) out = scrub_credit_cards(out) # Tier B — only when the caller has NOT opted into PII. unless send_default_pii out = out.gsub(EMAIL_REGEX, REDACTED) out = out.gsub(IPV4_REGEX, REDACTED) out = out.gsub(IPV6_REGEX, REDACTED) end out rescue StandardError str end |
.sensitive?(key, denylist) ⇒ Boolean
188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/allstak/sanitizer.rb', line 188 def sensitive?(key, denylist) return false unless key.is_a?(String) || key.is_a?(Symbol) # Exact, case-sensitive allowlist wins: a first-class SDK field (e.g. # release-health `sessionId`) is never scrubbed even though its lowercase # form contains a denied substring. Checked against the ORIGINAL key so # `sessionId` survives while `sessionid`/`session_id`/`session` are scrubbed. return false if ALLOWLIST.include?(key.to_s) k = key.to_s.downcase denylist.any? { |term| k.include?(term) } end |
.skip_subtree?(key) ⇒ Boolean
256 257 258 259 |
# File 'lib/allstak/sanitizer.rb', line 256 def skip_subtree?(key) return false unless key.is_a?(String) || key.is_a?(Symbol) VALUE_SCRUB_SKIP_SUBTREES.key?(key.to_s.downcase) end |
.skip_value_scrub_key?(key) ⇒ Boolean
261 262 263 264 |
# File 'lib/allstak/sanitizer.rb', line 261 def skip_value_scrub_key?(key) return false unless key.is_a?(String) || key.is_a?(Symbol) VALUE_SCRUB_SKIP_KEYS.key?(key.to_s.downcase) end |
.walk(value, denylist, seen, send_default_pii) ⇒ Object
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
# File 'lib/allstak/sanitizer.rb', line 201 def walk(value, denylist, seen, send_default_pii) case value when Hash return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.each_with_object({}) do |(k, v), out| out[k] = if sensitive?(k, denylist) REDACTED elsif skip_subtree?(k) # Explicit user object / stack frames: deep-copy with key-name # redaction still applied, but NO value-pattern scrubbing. walk_keys_only(v, denylist, seen) elsif skip_value_scrub_key?(k) # Structured scalar (release, url, span id, …): recurse for nested # collections, but do not value-scrub a scalar string here. v.is_a?(Hash) || v.is_a?(Array) ? walk(v, denylist, seen, send_default_pii) : v else walk(v, denylist, seen, send_default_pii) end end when Array return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.map { |v| walk(v, denylist, seen, send_default_pii) } when String scrub_value(value, send_default_pii) else value end end |
.walk_keys_only(value, denylist, seen) ⇒ Object
Recurse applying ONLY key-name redaction (no value-pattern scrubbing). Used for exempt subtrees (explicit user object, stack frames).
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
# File 'lib/allstak/sanitizer.rb', line 237 def walk_keys_only(value, denylist, seen) case value when Hash return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.each_with_object({}) do |(k, v), out| out[k] = sensitive?(k, denylist) ? REDACTED : walk_keys_only(v, denylist, seen) end when Array return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.map { |v| walk_keys_only(v, denylist, seen) } else value end end |