Module: Pikuri::Sanitizer

Defined in:
lib/pikuri/sanitizer.rb

Overview

Renders attacker-controlled text safe to display, and reports why it was unsafe.

Every string an LLM composes is untrusted: a bash command, a tool observation echoed back to the user, a description it wrote for a confirmation prompt. A model that is broken — or, far more likely, being driven by a prompt injection — can embed bytes that a terminal acts on rather than prints: a carriage return that overwrites the line the user just read, an ESC that recolors or repositions, a backspace that erases, a bidirectional override that reorders text so it reads differently than it runs, a zero-width character that hides in plain sight, or a Cyrillic а masquerading as a Latin a. The whole point of a confirmation prompt collapses if the bytes the user approves are not the bytes that execute.

Sanitizer.sanitize is the one chrome-independent primitive every renderer (terminal, TUI, web) routes through. It does two things and returns both as a Result:

  1. Neutralize — make the dangerous bytes visible without changing structure. Control bytes become \xNN, bidi/zero-width codepoints become u{NNNN}, tab becomes \t. Newlines are preserved (multi-line commands are normal). This is *faithful, not beautifying*: it never collapses runs of whitespace or rewrites a tab to a space, because the user must see exactly what they are approving — a Makefile’s leading tab stays visibly a tab. A web chrome composes html_escape(sanitize(s).text); the HTML layer is the caller’s, not ours.

  2. Warn — return a Warning per category detected, each a semantic record (kind + offending tokens + a plain-English explanation). Presentation is the chrome’s: a terminal renders these bold yellow, a web client a banner. The Warning carries no color or markup.

Scope (deliberately closed)

Detection covers the *invisibility / cursor-control / reordering* attack classes completely, because each is a finite, enumerable set of codepoints: C0 controls, C1 controls (a second ANSI introducer on some emulators), DEL, the bidi overrides, and the zero-width characters. On top of that, Sanitizer.sanitize flags *mixed-script tokens* —a single word combining letters from Latin + Cyrillic + Greek, which is the signature of a homoglyph spoof and has near-zero false positives on real text (humans do not weld two alphabets inside one word; café is all-Latin, Москва all-Cyrillic, only Pаypal mixes).

Two confusable classes are explicitly *out of scope*, because detecting them needs Unicode confusables tables and produces heavy false positives on legitimate multilingual text:

  • Whole-script homoglyphs — an entirely-Cyrillic string that merely looks Latin (no mixing to detect).

  • Single-symbol confusables — the Greek question mark ; (U+037E) that looks like a semicolon, full-width forms, the division slash.

“Solid” here means complete on the classes above, not exhaustive over all of Unicode.

Defined Under Namespace

Classes: Result, Warning

Constant Summary collapse

BIDI_OVERRIDES =

Bidirectional-override codepoints: the explicit LRO/RLO/PDF/LRE/RLE set plus the isolate set (LRI/RLI/FSI/PDI). Reordering attacks.

[*0x202a..0x202e, *0x2066..0x2069].freeze
ZERO_WIDTH =

Zero-width and invisible codepoints: ZWSP, ZWNJ, ZWJ, and the BOM / zero-width no-break space.

[0x200b, 0x200c, 0x200d, 0xfeff].freeze
SUSPECT =

Codepoints sanitize rewrites: C0 controls including tab (U+0009) but excluding newline (U+000A, which passes through untouched), C1 controls + DEL (U+007F–009F), the zero-width set, and the bidi overrides. Newline is the one control character a faithful render must keep, so the C0 range is split around it.

/[\u0000-\u0009\u000b-\u001f\u007f-\u009f\u200b-\u200d\u202a-\u202e\u2066-\u2069\ufeff]/
CONFUSABLE_SCRIPTS =

The three Latin-confusable scripts whose mixing inside one token signals a homoglyph spoof. Punctuation, digits and spaces are the Common script and match none of these, so they never count toward the “two distinct scripts” threshold.

{ 'Latin' => /\p{Latin}/, 'Cyrillic' => /\p{Cyrillic}/, 'Greek' => /\p{Greek}/ }.freeze

Class Method Summary collapse

Class Method Details

.mixed_script_tokens(text) ⇒ Array<String>

Tokens (whitespace-delimited runs) that combine letters from two or more of CONFUSABLE_SCRIPTS — the homoglyph-spoof signature.

Parameters:

  • text (String)

Returns:

  • (Array<String>)

    distinct offending tokens, first-seen order



137
138
139
140
141
# File 'lib/pikuri/sanitizer.rb', line 137

def self.mixed_script_tokens(text)
  text.split(/\s+/).reject(&:empty?).select do |token|
    CONFUSABLE_SCRIPTS.count { |_name, re| token.match?(re) } >= 2
  end.uniq
end

.sanitize(text) ⇒ Result

Neutralize text for literal display and report what was flagged.

Parameters:

  • text (String)

    attacker-controlled text (an LLM-composed command, description, or tool observation), e.g. “echo hirrm -rf /”

Returns:

  • (Result)

    the neutralized text plus an Array<Warning> (empty when clean)



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/pikuri/sanitizer.rb', line 107

def self.sanitize(text)
  backspace  = false
  control    = []
  bidi       = []
  zero_width = []

  clean = text.gsub(SUSPECT) do |ch|
    cp = ch.ord
    if cp == 0x09
      '\\t'
    elsif cp == 0x08
      backspace = true
      '\\x08'
    elsif BIDI_OVERRIDES.include?(cp)
      format('\\u{%04x}', cp).tap { |t| bidi << t }
    elsif ZERO_WIDTH.include?(cp)
      format('\\u{%04x}', cp).tap { |t| zero_width << t }
    else
      format('\\x%02x', cp).tap { |t| control << t }
    end
  end

  Result.new(text: clean, warnings: warnings_for(backspace, control, bidi, zero_width, mixed_script_tokens(text)))
end