Module: Philiprehberger::EncodingKit::Converter

Defined in:
lib/philiprehberger/encoding_kit/converter.rb

Overview

Encoding conversion with fallback handling

Class Method Summary collapse

Class Method Details

.convert(string, from:, to:, fallback: :replace, replace: '?') ⇒ String

Convert a string from one encoding to another.

Parameters:

  • string (String)

    the input string

  • from (String, Encoding)

    source encoding

  • to (String, Encoding)

    target encoding

  • fallback (Symbol) (defaults to: :replace)

    fallback strategy (:replace or :raise)

  • replace (String) (defaults to: '?')

    replacement character for invalid bytes

Returns:

  • (String)

    the converted string

Raises:



17
18
19
20
21
22
23
24
25
26
27
28
29
30
# File 'lib/philiprehberger/encoding_kit/converter.rb', line 17

def convert(string, from:, to:, fallback: :replace, replace: '?')
  source = Encoding.find(from.to_s)
  target = Encoding.find(to.to_s)

  str = string.dup.force_encoding(source)

  if fallback == :replace
    str.encode(target, invalid: :replace, undef: :replace, replace: replace)
  else
    str.encode(target)
  end
rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError => e
  raise Error, "Encoding conversion failed: #{e.message}"
end

.normalize(string) ⇒ String

Force a string to valid UTF-8 by replacing invalid and undefined bytes.

Parameters:

  • string (String)

    the input string

Returns:

  • (String)

    valid UTF-8 string with replacement characters for bad bytes



48
49
50
51
52
53
54
55
# File 'lib/philiprehberger/encoding_kit/converter.rb', line 48

def normalize(string)
  str = string.dup
  str.force_encoding(Encoding::UTF_8) if [Encoding::BINARY, Encoding::ASCII_8BIT].include?(str.encoding)

  return str if str.encoding == Encoding::UTF_8 && str.valid_encoding?

  str.encode(Encoding::UTF_8, str.encoding, invalid: :replace, undef: :replace, replace: "\uFFFD")
end

.to_utf8(string, from: nil) ⇒ String

Convert a string to UTF-8, optionally auto-detecting the source encoding.

Parameters:

  • string (String)

    the input string

  • from (String, Encoding, nil) (defaults to: nil)

    source encoding (auto-detect if nil)

Returns:

  • (String)

    UTF-8 encoded string



37
38
39
40
41
42
# File 'lib/philiprehberger/encoding_kit/converter.rb', line 37

def to_utf8(string, from: nil)
  detected = from ? Encoding.find(from.to_s) : Detector.call(string)
  source = detected.is_a?(DetectionResult) ? detected.encoding : detected
  str = string.dup.force_encoding(source)
  str.encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: "\uFFFD")
end