Module: RosettAi::TextSanitizer

Defined in:
lib/rosett_ai/text_sanitizer.rb

Overview

Sanitizes text for safe TUI display and normalizes Unicode input.

strip_ansi — removes ANSI escape sequences and control characters at display boundaries (criterion 10). normalize_nfc — applies Unicode NFC normalization at input boundaries (criterion 11).

Constant Summary collapse

ANSI_PATTERN =

Matches ANSI CSI sequences (\e[...X), OSC sequences (\e]...\a), and non-printable control characters except tab, newline, and CR.

/\e\[\d*(?:;\d*)*[A-Za-z]|\e\][^\a]*\a|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/
MARKDOWN_META =

Characters that have special meaning in Markdown syntax.

/([\\`*_{}\[\]()#+\-.!|])/

Class Method Summary collapse

Class Method Details

.escape_markdown(text) ⇒ String

Escapes Markdown metacharacters so text renders as literal content.

Parameters:

  • text (String, nil)

    the input text

Returns:

  • (String)

    the escaped text safe for Markdown embedding



41
42
43
# File 'lib/rosett_ai/text_sanitizer.rb', line 41

def self.escape_markdown(text)
  text.to_s.gsub(MARKDOWN_META, '\\\\\1')
end

.normalize_nfc(string) ⇒ String

Applies Unicode NFC normalization at input boundaries.

Parameters:

  • string (String)

    the input string to normalize

Returns:

  • (String)

    the NFC-normalized UTF-8 string



30
31
32
# File 'lib/rosett_ai/text_sanitizer.rb', line 30

def self.normalize_nfc(string)
  String(string).encode('UTF-8').unicode_normalize(:nfc)
end

.sanitize_for_display(data) ⇒ String, ...

Recursively strips ANSI escape sequences from strings in nested structures.

Parameters:

  • data (String, Hash, Array, Object)

    the data to sanitize

Returns:

  • (String, Hash, Array, Object)

    the sanitized data in its original structure



49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/rosett_ai/text_sanitizer.rb', line 49

def self.sanitize_for_display(data)
  case data
  when String
    strip_ansi(data)
  when Hash
    data.transform_values { |v| sanitize_for_display(v) }
  when Array
    data.map { |v| sanitize_for_display(v) }
  else
    data
  end
end

.strip_ansi(string) ⇒ String

Removes ANSI escape sequences and non-printable control characters.

Parameters:

  • string (String)

    the input string to sanitize

Returns:

  • (String)

    the string with ANSI sequences removed



22
23
24
# File 'lib/rosett_ai/text_sanitizer.rb', line 22

def self.strip_ansi(string)
  String(string).gsub(ANSI_PATTERN, '')
end