Class: Lutaml::Xml::EncodingNormalizer
- Inherits:
-
Object
- Object
- Lutaml::Xml::EncodingNormalizer
- Defined in:
- lib/lutaml/xml/encoding_normalizer.rb
Overview
EncodingNormalizer ensures all XML text content is normalized to UTF-8 internally, regardless of source encoding or adapter used.
This provides:
-
Consistent developer experience across adapters
-
UTF-8 as internal encoding (Ruby’s default)
-
Ability to output in any encoding on serialization
Class Method Summary collapse
-
.normalize_to_utf8(content, source_encoding: nil) ⇒ String
Normalize text content to UTF-8 for internal consistency.
Class Method Details
.normalize_to_utf8(content, source_encoding: nil) ⇒ String
Normalize text content to UTF-8 for internal consistency
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/lutaml/xml/encoding_normalizer.rb', line 24 def self.normalize_to_utf8(content, source_encoding: nil) return content if content.nil? || content.empty? # Return content if already valid UTF-8 if content.encoding == Encoding::UTF_8 && content.valid_encoding? return content end # Determine source encoding encoding = resolve_encoding(content, source_encoding) # Convert to UTF-8 content.encode(Encoding::UTF_8, encoding, invalid: :replace, undef: :replace, replace: "?") rescue Encoding::UndefinedConversionError, Encoding::InvalidByteSequenceError # Fallback: force UTF-8 encoding and scrub invalid bytes content.force_encoding(Encoding::UTF_8).scrub("?") end |