Module: RosettAi::TextSanitizer
- Defined in:
- lib/rosett_ai/text_sanitizer.rb
Overview
Sanitizes text for safe TUI display and normalizes Unicode input.
strip_ansi — removes ANSI escape sequences and control characters at display boundaries (criterion 10). normalize_nfc — applies Unicode NFC normalization at input boundaries (criterion 11).
Constant Summary collapse
- ANSI_PATTERN =
Matches ANSI CSI sequences (\e[...X), OSC sequences (\e]...\a), and non-printable control characters except tab, newline, and CR.
/\e\[\d*(?:;\d*)*[A-Za-z]|\e\][^\a]*\a|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/- MARKDOWN_META =
Characters that have special meaning in Markdown syntax.
/([\\`*_{}\[\]()#+\-.!|])/
Class Method Summary collapse
-
.escape_markdown(text) ⇒ String
Escapes Markdown metacharacters so text renders as literal content.
-
.normalize_nfc(string) ⇒ String
Applies Unicode NFC normalization at input boundaries.
-
.sanitize_for_display(data) ⇒ String, ...
Recursively strips ANSI escape sequences from strings in nested structures.
-
.strip_ansi(string) ⇒ String
Removes ANSI escape sequences and non-printable control characters.
Class Method Details
.escape_markdown(text) ⇒ String
Escapes Markdown metacharacters so text renders as literal content.
41 42 43 |
# File 'lib/rosett_ai/text_sanitizer.rb', line 41 def self.escape_markdown(text) text.to_s.gsub(MARKDOWN_META, '\\\\\1') end |
.normalize_nfc(string) ⇒ String
Applies Unicode NFC normalization at input boundaries.
30 31 32 |
# File 'lib/rosett_ai/text_sanitizer.rb', line 30 def self.normalize_nfc(string) String(string).encode('UTF-8').unicode_normalize(:nfc) end |
.sanitize_for_display(data) ⇒ String, ...
Recursively strips ANSI escape sequences from strings in nested structures.
49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/rosett_ai/text_sanitizer.rb', line 49 def self.sanitize_for_display(data) case data when String strip_ansi(data) when Hash data.transform_values { |v| sanitize_for_display(v) } when Array data.map { |v| sanitize_for_display(v) } else data end end |
.strip_ansi(string) ⇒ String
Removes ANSI escape sequences and non-printable control characters.
22 23 24 |
# File 'lib/rosett_ai/text_sanitizer.rb', line 22 def self.strip_ansi(string) String(string).gsub(ANSI_PATTERN, '') end |