Class: Clacky::Tools::EncodingSafeBuffer
- Inherits:
-
Object
- Object
- Clacky::Tools::EncodingSafeBuffer
- Defined in:
- lib/clacky/tools/shell.rb
Overview
A StringIO wrapper that scrubs invalid/undefined bytes to UTF-8 on every write. Shell commands (via popen3) can emit bytes in any encoding (GBK, Latin-1, binary, …). By sanitizing at the earliest possible point we guarantee that every downstream operation — regex matching, line splitting, JSON serialization — never sees invalid byte sequences.
Instance Method Summary collapse
-
#initialize ⇒ EncodingSafeBuffer
constructor
A new instance of EncodingSafeBuffer.
- #string ⇒ Object
- #write(data) ⇒ Object
Constructor Details
#initialize ⇒ EncodingSafeBuffer
Returns a new instance of EncodingSafeBuffer.
15 16 17 18 19 20 21 |
# File 'lib/clacky/tools/shell.rb', line 15 def initialize # Use ASCII-8BIT backing store to accept raw bytes from popen3 without # encoding conflicts. Scrubbing happens on write; the string method # re-labels the result as UTF-8 on the way out so callers (JSON.generate, # regex, etc.) always see a properly-tagged UTF-8 string. @io = StringIO.new("".b) end |
Instance Method Details
#string ⇒ Object
33 34 35 36 37 38 |
# File 'lib/clacky/tools/shell.rb', line 33 def string # Re-label the accumulated bytes as UTF-8. By this point every byte # has already been scrubbed by to_utf8 on write, so force_encoding is # safe and avoids an unnecessary copy. @io.string.force_encoding("UTF-8") end |
#write(data) ⇒ Object
23 24 25 26 27 28 29 30 31 |
# File 'lib/clacky/tools/shell.rb', line 23 def write(data) return unless data && !data.empty? # Shell output arrives as binary (ASCII-8BIT) bytes. Use the shared # helper which scrubs only genuinely invalid sequences, preserving # multibyte characters (e.g. CJK). The result is written as raw bytes # into the ASCII-8BIT buffer. @io.write(Clacky::Utils::Encoding.to_utf8(data).b) end |