Module: AllStak::Sanitizer
- Defined in:
- lib/allstak/sanitizer.rb
Constant Summary collapse
- REDACTED =
"[REDACTED]"- DEFAULT_DENYLIST =
%w[ authorization proxy-authorization cookie set-cookie password passwd pwd api_key apikey x-api-key x-allstak-key x-auth-token x-access-token token bearer jwt session sessionid session_id secret credit_card card_number cvv ssn csrf ].freeze
- ALLOWLIST =
Exact, CASE-SENSITIVE keys that look sensitive by substring but are NOT —they are first-class SDK telemetry fields that must survive scrubbing. The release-health ‘sessionId` (camelCase) carries the SDK’s own per-process session id (a random UUID, not a user/auth session token); the backend error consumer needs it to attribute crashes, so it must never be redacted. Matched exactly and case-sensitively, so genuine cookie/auth keys like ‘session`, `session_id`, or `sessionid` (the lower-case denylist terms) are still scrubbed.
%w[ sessionId ].freeze
- MAX_SCAN_LENGTH =
Longest single string we will scan for value patterns. Larger strings are passed through untouched so a pathological multi-MB blob never stalls the wire path. Key-name redaction still applies to its containing key.
16_384- VALUE_SCRUB_SKIP_KEYS =
Keys whose scalar string value is exempt from value-pattern scrubbing (matched case-sensitively against the original key, then case-insensitively as a fallback). These carry structured identifiers / locations that the patterns would otherwise corrupt: stack-frame fields, release/sdk/build metadata, span & trace ids, URLs/paths (their own URL redactor owns them).
%w[ filename function abspath abs_path lineno colno release version dist platform environment sdkname sdk_name sdkversion sdk_version sdk.name sdk.version commit.sha commit.branch commit_sha url path host hostname route operation op spanid span_id parentspanid parent_span_id traceid trace_id requestid request_id sessionid sessionId timestamp ].each_with_object({}) { |k, h| h[k.downcase] = true }.freeze
- VALUE_SCRUB_SKIP_SUBTREES =
Top-level subtrees that are never value-scrubbed. ‘user` holds data the caller explicitly set via setUser (intentional identification — ships as before). `frames`/`stackTrace` hold structured stack frames whose filenames/functions must not be corrupted.
%w[ user frames stackTrace stacktrace ].each_with_object({}) { |k, h| h[k.downcase] = true }.freeze
- SSN_REGEX =
US SSN — REQUIRE the hyphens so bare 9-digit numbers (order ids, etc.) are not nuked. Compiled once.
/\b\d{3}-\d{2}-\d{4}\b/.freeze
- CC_CANDIDATE_REGEX =
Candidate credit-card runs: 13–19 digits with optional single space/hyphen separators between groups. Luhn-validated before redaction (see #luhn?), so digit runs that fail the checksum (timestamps, order ids) survive. Word-boundary-ish anchors keep us from matching the middle of a longer digit string.
/(?<![\d-])(?:\d[ -]?){12,18}\d(?![\d-])/.freeze
- EMAIL_REGEX =
Standard email address. Compiled once.
/\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b/.freeze
- IPV4_OCTET =
IPv4 with each octet validated to 0–255. Compiled once.
'(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)'- IPV4_REGEX =
/\b#{IPV4_OCTET}\.#{IPV4_OCTET}\.#{IPV4_OCTET}\.#{IPV4_OCTET}\b/.freeze
- IPV6_REGEX =
- IPv6 best-effort: 2+ groups of hex separated by colons, with optional
-
compression. Intentionally loose — IPv6 detection is best-effort per spec.
/\b(?:[0-9A-Fa-f]{1,4}:){2,7}[0-9A-Fa-f]{0,4}\b|\b::(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4}\b/.freeze
Class Method Summary collapse
-
.luhn?(digits) ⇒ Boolean
Luhn (mod-10) checksum over a string of digits.
- .record_redaction ⇒ Object
- .redaction_count ⇒ Object
- .reset_redaction_count! ⇒ Object
-
.scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) ⇒ Object
Returns a sanitized deep copy of ‘payload`.
-
.scrub_credit_cards(str) ⇒ Object
Replace only those candidate credit-card runs that pass the Luhn checksum.
-
.scrub_value(str, send_default_pii) ⇒ Object
Apply value-pattern scrubbing to a single string.
- .sensitive?(key, denylist) ⇒ Boolean
- .skip_subtree?(key) ⇒ Boolean
- .skip_value_scrub_key?(key) ⇒ Boolean
- .walk(value, denylist, seen, send_default_pii) ⇒ Object
-
.walk_keys_only(value, denylist, seen) ⇒ Object
Recurse applying ONLY key-name redaction (no value-pattern scrubbing).
Class Method Details
.luhn?(digits) ⇒ Boolean
Luhn (mod-10) checksum over a string of digits.
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 |
# File 'lib/allstak/sanitizer.rb', line 335 def luhn?(digits) return false unless digits =~ /\A\d{13,19}\z/ sum = 0 double = false digits.reverse.each_char do |ch| d = ch.to_i if double d *= 2 d -= 9 if d > 9 end sum += d double = !double end (sum % 10).zero? end |
.record_redaction ⇒ Object
186 187 188 189 190 |
# File 'lib/allstak/sanitizer.rb', line 186 def record_redaction @redaction_mutex.synchronize { @redaction_count += 1 } rescue StandardError nil end |
.redaction_count ⇒ Object
173 174 175 176 177 |
# File 'lib/allstak/sanitizer.rb', line 173 def redaction_count @redaction_mutex.synchronize { @redaction_count } rescue StandardError 0 end |
.reset_redaction_count! ⇒ Object
179 180 181 182 183 184 |
# File 'lib/allstak/sanitizer.rb', line 179 def reset_redaction_count! @redaction_mutex.synchronize { @redaction_count = 0 } nil rescue StandardError nil end |
.scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) ⇒ Object
Returns a sanitized deep copy of ‘payload`.
203 204 205 206 207 208 209 |
# File 'lib/allstak/sanitizer.rb', line 203 def scrub(payload, extra_denylist: nil, send_default_pii: false, values: true) denylist = DEFAULT_DENYLIST.dup denylist.concat(extra_denylist.map { |t| t.to_s.downcase }) if extra_denylist denylist.uniq! return walk_keys_only(payload, denylist, Set.new) unless values walk(payload, denylist, Set.new, send_default_pii) end |
.scrub_credit_cards(str) ⇒ Object
Replace only those candidate credit-card runs that pass the Luhn checksum. A run that fails Luhn (e.g. an order id or timestamp that happens to be 13–19 digits) is left intact, minimizing over-redaction.
323 324 325 326 327 328 329 330 331 332 |
# File 'lib/allstak/sanitizer.rb', line 323 def scrub_credit_cards(str) str.gsub(CC_CANDIDATE_REGEX) do |match| digits = match.gsub(/[ -]/, "") if digits.length.between?(13, 19) && luhn?(digits) REDACTED else match end end end |
.scrub_value(str, send_default_pii) ⇒ Object
Apply value-pattern scrubbing to a single string. Fail-open: any error returns the original string. Oversized strings are passed through.
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
# File 'lib/allstak/sanitizer.rb', line 297 def scrub_value(str, send_default_pii) return str unless str.is_a?(String) return str if str.empty? || str.length > MAX_SCAN_LENGTH out = str # Tier A — ALWAYS (regardless of send_default_pii). out = out.gsub(SSN_REGEX, REDACTED) out = scrub_credit_cards(out) # Tier B — only when the caller has NOT opted into PII. unless send_default_pii out = out.gsub(EMAIL_REGEX, REDACTED) out = out.gsub(IPV4_REGEX, REDACTED) out = out.gsub(IPV6_REGEX, REDACTED) end record_redaction if out != str out rescue StandardError str end |
.sensitive?(key, denylist) ⇒ Boolean
211 212 213 214 215 216 217 218 219 220 221 222 |
# File 'lib/allstak/sanitizer.rb', line 211 def sensitive?(key, denylist) return false unless key.is_a?(String) || key.is_a?(Symbol) # Exact, case-sensitive allowlist wins: a first-class SDK field (e.g. # release-health `sessionId`) is never scrubbed even though its lowercase # form contains a denied substring. Checked against the ORIGINAL key so # `sessionId` survives while `sessionid`/`session_id`/`session` are scrubbed. return false if ALLOWLIST.include?(key.to_s) k = key.to_s.downcase denylist.any? { |term| k.include?(term) } end |
.skip_subtree?(key) ⇒ Boolean
285 286 287 288 |
# File 'lib/allstak/sanitizer.rb', line 285 def skip_subtree?(key) return false unless key.is_a?(String) || key.is_a?(Symbol) VALUE_SCRUB_SKIP_SUBTREES.key?(key.to_s.downcase) end |
.skip_value_scrub_key?(key) ⇒ Boolean
290 291 292 293 |
# File 'lib/allstak/sanitizer.rb', line 290 def skip_value_scrub_key?(key) return false unless key.is_a?(String) || key.is_a?(Symbol) VALUE_SCRUB_SKIP_KEYS.key?(key.to_s.downcase) end |
.walk(value, denylist, seen, send_default_pii) ⇒ Object
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/allstak/sanitizer.rb', line 224 def walk(value, denylist, seen, send_default_pii) case value when Hash return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.each_with_object({}) do |(k, v), out| out[k] = if sensitive?(k, denylist) record_redaction REDACTED elsif skip_subtree?(k) # Explicit user object / stack frames: deep-copy with key-name # redaction still applied, but NO value-pattern scrubbing. walk_keys_only(v, denylist, seen) elsif skip_value_scrub_key?(k) # Structured scalar (release, url, span id, …): recurse for nested # collections, but do not value-scrub a scalar string here. v.is_a?(Hash) || v.is_a?(Array) ? walk(v, denylist, seen, send_default_pii) : v else walk(v, denylist, seen, send_default_pii) end end when Array return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.map { |v| walk(v, denylist, seen, send_default_pii) } when String scrub_value(value, send_default_pii) else value end end |
.walk_keys_only(value, denylist, seen) ⇒ Object
Recurse applying ONLY key-name redaction (no value-pattern scrubbing). Used for exempt subtrees (explicit user object, stack frames).
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
# File 'lib/allstak/sanitizer.rb', line 261 def walk_keys_only(value, denylist, seen) case value when Hash return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.each_with_object({}) do |(k, v), out| if sensitive?(k, denylist) record_redaction out[k] = REDACTED else out[k] = walk_keys_only(v, denylist, seen) end end when Array return REDACTED if seen.include?(value.object_id) seen.add(value.object_id) value.map { |v| walk_keys_only(v, denylist, seen) } else value end end |