Module: IuguLogger::Pii
- Defined in:
- lib/iugu_logger/pii.rb
Overview
PII detection and redaction module.
3-layer defense:
- Layer 1 (ParamFilter): blocks values of keys whose names match a
sensitive blocklist (password, secret, token, etc.) BEFORE the deep
content scan
- Layer 2 (Scanner): regex-based deep content redaction in all string
fields, with strategy-based replacement (full_redact, last4,
detect_only, preserve)
- Layer 3 (Logger): emitted log payload always carries pii.scanned=true
populated by Scanner — handled in Logger, not here
PII patterns reuse those validated in production by core/utils/sanitizer.py (iugu-agents).
Decisions applied:
- ILS-002: iugu.account_id 32-hex preserved (SAFE_PATTERNS exclusion)
- ILS-003: email :detect_only by default (deferred — tech debt)
Spec: IUGU_LOGGING_STANDARD.md §5
Defined Under Namespace
Constant Summary collapse
- PATTERNS =
{ cpf: /\b\d{3}\.?\d{3}\.?\d{3}-?\d{2}\b/, cnpj: /\b\d{2}\.\d{3}\.\d{3}\/\d{4}-\d{2}\b/, email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/, # Lookarounds (?<![\w-]) and (?![\w-]) require the phone-shaped digit # group to be flanked by non-identifier chars. Without them the regex # matched the middle of dense identifiers (span_id, trace_id, jids, # UUIDs without hyphens) and produced false positives that broke trace # correlation in production. SAFE_KEY_PATHS is the primary defense; # this is defense-in-depth for arbitrary user content. phone: /(?<![\w-])\(?\d{2}\)?\s?9?\d{4}-?\d{4}(?![\w-])/, cc: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{1,7}\b/, aws_key: /\bAKIA[0-9A-Z]{16}\b/, bearer: /Bearer\s+[A-Za-z0-9\-._~+\/]+=*\b/i, url_with_creds: /https?:\/\/[^\/\s:]+:[^\/\s@]+@\S+/ }.freeze
- SAFE_PATTERNS =
Strings matching SAFE_PATTERNS are excluded from redaction even when they incidentally match a PII pattern. Hex identifiers (trace_id 32, span_id 16, account_id 32, UUID v4) are structural identifiers that by definition never carry PII; pre-empting them at the value level avoids the regex coincidence that any 10+ consecutive digits look like a Brazilian phone number.
{ iugu_account_id: /\A[A-Fa-f0-9]{32}\z/, # 32-hex (case-insensitive: legacy uppercase + modern lowercase) otel_trace_id: /\A[a-fA-F0-9]{32}\z/, # OpenTelemetry trace_id (16 bytes hex) otel_span_id: /\A[a-fA-F0-9]{16}\z/, # OpenTelemetry span_id (8 bytes hex) uuid: /\A[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}\z/i # UUID v4 / v7 }.freeze
- SAFE_KEY_PATHS =
Canonical schema paths whose values are skipped entirely by the scan. These fields hold structural identifiers / controlled metadata defined by IUGU_LOGGING_STANDARD §2 — never user-supplied content. Skipping them prevents false-positive PII detection on hex/UUID-shaped values AND saves CPU on the hot path.
Path = dot-joined hash keys from the user_section root, e.g. a value at ‘payload[’span_id’]‘ has path “trace.span_id”.
%w[ @timestamp log.level event.kind event.action service.name service.version service.environment service.instance trace.id trace.span_id trace.parent_id request.id http.status_code http.duration_ms ].freeze
- DEFAULT_STRATEGIES =
Default redaction strategies. Override via Configuration#pii_redaction.
Strategies:
:full_redact → "[<TYPE>_REDACTED]" :last4 → "**** **** **** 1234" (CC only) :detect_only → unchanged content, but `detected` is recorded :preserve → neither detected nor redacted (escape hatch)Philosophy (data-completeness-first, since v0.7):
Operational logs in iugu serve ops, support, fraud analysts, compliance, and ML pipelines — not just engineers. Redacting personal data at emission time breaks those downstream consumers; the legacy rails_semantic_logger output that they already rely on includes full CPF, CNPJ, phone, address, email, bank account details. We normalize that — ‘:detect_only` means we still RECORD that PII was found (so `pii.detected: [cpf, phone]` is queryable for audit) but we don’t remove the values from the log. LGPD compliance is met via access-control on the log store and retention policies, not via redaction at the source.
Things that DO stay redacted by default:
- Payment card numbers (`:cc` → `:last4`) — PCI-DSS hard rule - Credentials (`aws_key`, `bearer`, `url_with_creds`) — these are never user data, only ever leak riskOverride per-app: any app needing stricter redaction (e.g. external log export targets) can set ‘:full_redact` for the types it needs via `IuguLogger.configure { |c| c.pii_redaction = … }`.
{ cpf: :detect_only, # personal data — detected, not redacted (v0.7+) cnpj: :detect_only, # legal entity — detected, not redacted (v0.7+) email: :detect_only, # personal data — detected, not redacted (was always) phone: :detect_only, # personal data — detected, not redacted (v0.7+) cc: :last4, # PCI-DSS — last 4 only (KEPT) aws_key: :full_redact, # credential — never log (KEPT) bearer: :full_redact, # credential — never log (KEPT) url_with_creds: :full_redact # credential — never log (KEPT) }.freeze
- DEFAULT_PARAM_BLOCKLIST =
Layer 1: keys whose values are filtered before any scanning. Case-insensitive.
%w[ password password_confirmation passwd secret token api_key apikey authorization auth bearer_token credit_card cc_number ccnumber cvv cvc ssn pin private_key ].freeze
- PARAM_FILTER_PLACEHOLDER =
'[FILTERED]'