Module: Philiprehberger::EncodingKit::Converter
- Defined in:
- lib/philiprehberger/encoding_kit/converter.rb
Overview
Encoding conversion with fallback handling
Class Method Summary collapse
-
.convert(string, from:, to:, fallback: :replace, replace: '?') ⇒ String
Convert a string from one encoding to another.
-
.normalize(string) ⇒ String
Force a string to valid UTF-8 by replacing invalid and undefined bytes.
-
.to_utf8(string, from: nil) ⇒ String
Convert a string to UTF-8, optionally auto-detecting the source encoding.
Class Method Details
.convert(string, from:, to:, fallback: :replace, replace: '?') ⇒ String
Convert a string from one encoding to another.
17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/philiprehberger/encoding_kit/converter.rb', line 17 def convert(string, from:, to:, fallback: :replace, replace: '?') source = Encoding.find(from.to_s) target = Encoding.find(to.to_s) str = string.dup.force_encoding(source) if fallback == :replace str.encode(target, invalid: :replace, undef: :replace, replace: replace) else str.encode(target) end rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError => e raise Error, "Encoding conversion failed: #{e.}" end |
.normalize(string) ⇒ String
Force a string to valid UTF-8 by replacing invalid and undefined bytes.
48 49 50 51 52 53 54 55 |
# File 'lib/philiprehberger/encoding_kit/converter.rb', line 48 def normalize(string) str = string.dup str.force_encoding(Encoding::UTF_8) if [Encoding::BINARY, Encoding::ASCII_8BIT].include?(str.encoding) return str if str.encoding == Encoding::UTF_8 && str.valid_encoding? str.encode(Encoding::UTF_8, str.encoding, invalid: :replace, undef: :replace, replace: "\uFFFD") end |
.to_utf8(string, from: nil) ⇒ String
Convert a string to UTF-8, optionally auto-detecting the source encoding.
37 38 39 40 41 42 |
# File 'lib/philiprehberger/encoding_kit/converter.rb', line 37 def to_utf8(string, from: nil) detected = from ? Encoding.find(from.to_s) : Detector.call(string) source = detected.is_a?(DetectionResult) ? detected.encoding : detected str = string.dup.force_encoding(source) str.encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: "\uFFFD") end |