Class: Clacky::Tools::EncodingSafeBuffer

Inherits:
Object
  • Object
show all
Defined in:
lib/clacky/tools/shell.rb

Overview

A StringIO wrapper that scrubs invalid/undefined bytes to UTF-8 on every write. Shell commands (via popen3) can emit bytes in any encoding (GBK, Latin-1, binary, …). By sanitizing at the earliest possible point we guarantee that every downstream operation — regex matching, line splitting, JSON serialization — never sees invalid byte sequences.

Instance Method Summary collapse

Constructor Details

#initializeEncodingSafeBuffer

Returns a new instance of EncodingSafeBuffer.



15
16
17
18
19
20
21
# File 'lib/clacky/tools/shell.rb', line 15

def initialize
  # Use ASCII-8BIT backing store to accept raw bytes from popen3 without
  # encoding conflicts.  Scrubbing happens on write; the string method
  # re-labels the result as UTF-8 on the way out so callers (JSON.generate,
  # regex, etc.) always see a properly-tagged UTF-8 string.
  @io = StringIO.new("".b)
end

Instance Method Details

#stringObject



33
34
35
36
37
38
# File 'lib/clacky/tools/shell.rb', line 33

def string
  # Re-label the accumulated bytes as UTF-8.  By this point every byte
  # has already been scrubbed by to_utf8 on write, so force_encoding is
  # safe and avoids an unnecessary copy.
  @io.string.force_encoding("UTF-8")
end

#write(data) ⇒ Object



23
24
25
26
27
28
29
30
31
# File 'lib/clacky/tools/shell.rb', line 23

def write(data)
  return unless data && !data.empty?

  # Shell output arrives as binary (ASCII-8BIT) bytes.  Use the shared
  # helper which scrubs only genuinely invalid sequences, preserving
  # multibyte characters (e.g. CJK).  The result is written as raw bytes
  # into the ASCII-8BIT buffer.
  @io.write(Clacky::Utils::Encoding.to_utf8(data).b)
end