Module: MultiXML Private

Extended by:
Helpers, Options, ParseSupport, ParserResolution
Defined in:
lib/multi_xml.rb,
lib/multi_xml/errors.rb,
lib/multi_xml/parser.rb,
lib/multi_xml/helpers.rb,
lib/multi_xml/options.rb,
lib/multi_xml/version.rb,
lib/multi_xml/constants.rb,
lib/multi_xml/file_like.rb,
lib/multi_xml/deprecated.rb,
lib/multi_xml/parsers/ox.rb,
lib/multi_xml/concurrency.rb,
lib/multi_xml/parsers/oga.rb,
lib/multi_xml/parse_support.rb,
lib/multi_xml/parsers/rexml.rb,
lib/multi_xml/parsers/libxml.rb,
lib/multi_xml/parsers/nokogiri.rb,
lib/multi_xml/parser_resolution.rb,
lib/multi_xml/parsers/dom_parser.rb,
lib/multi_xml/parsers/libxml_sax.rb,
lib/multi_xml/parsers/sax_handler.rb,
lib/multi_xml/parsers/nokogiri_sax.rb,
lib/multi_xml/options_normalization.rb

Overview

This module is part of a private API. You should avoid using this module if possible, as it may be removed or be changed in the future.

Deprecated public API kept around for one major release

Each method here emits a one-time deprecation warning on first call and delegates to its current-API counterpart. The whole file is loaded by MultiXML so the deprecation surface stays out of the main module definition.

Defined Under Namespace

Modules: Concurrency, FileLike, Helpers, Options, OptionsNormalization, ParseSupport, Parser, ParserResolution, Parsers Classes: DisallowedTypeError, NoParserError, ParseError, ParserLoadError

Constant Summary collapse

VERSION =

The current version of MultiXML

Returns:

  • (Gem::Version)

    the gem version

Gem::Version.create("0.9.0")
TEXT_CONTENT_KEY =

Hash key for storing text content within element hashes

Examples:

Accessing text content

result = MultiXML.parse('<name>John</name>')
result["name"] #=> "John" (simplified, but internally uses __content__)

Returns:

  • (String)

    the key "content" used for text content

"__content__".freeze
RUBY_TYPE_TO_XML =

Maps Ruby class names to XML type attribute values

Examples:

Check XML type for a Ruby class

RUBY_TYPE_TO_XML["Integer"] #=> "integer"

Returns:

  • (Hash{String => String})

    mapping of Ruby class names to XML types

{
  "Symbol" => "symbol",
  "Integer" => "integer",
  "BigDecimal" => "decimal",
  "Float" => "float",
  "TrueClass" => "boolean",
  "FalseClass" => "boolean",
  "Date" => "date",
  "DateTime" => "datetime",
  "Time" => "datetime",
  "Array" => "array",
  "Hash" => "hash"
}.freeze
DISALLOWED_TYPES =

XML type attributes disallowed by default for security

These types are blocked to prevent code execution vulnerabilities.

Examples:

Check default disallowed types

DISALLOWED_TYPES #=> ["symbol", "yaml"]

Returns:

  • (Array<String>)

    list of disallowed type names

%w[symbol yaml].freeze
FALSE_BOOLEAN_VALUES =

Values that represent false in XML boolean attributes

Examples:

Check false values

FALSE_BOOLEAN_VALUES.include?("0") #=> true

Returns:

  • (Set<String>)

    values considered false

Set.new(%w[0 false]).freeze
NAMESPACE_MODES =

Supported values for the :namespaces parse option

Examples:

Parse with namespace preservation

MultiXML.parse(xml, namespaces: :preserve)

Returns:

  • (Array<Symbol>)

    the valid namespace handling modes

%i[strip preserve].freeze
DEFAULT_OPTIONS =

Default parsing options

Examples:

View defaults

DEFAULT_OPTIONS[:symbolize_names] #=> false

Returns:

  • (Hash)

    default options for parse method

{
  typecast_xml_value: true,
  disallowed_types: DISALLOWED_TYPES,
  symbolize_names: false,
  namespaces: :strip
}.freeze
PARSER_PREFERENCE =

Parser libraries in preference order (fastest first)

TruffleRuby's JIT favors pure-Ruby parsers and penalizes FFI-bound ones, so rexml jumps to the head of the list (after ox, which is filtered out of auto-detection by ParserResolution#skip_on_platform?) and nokogiri falls to last.

:nocov:

Examples:

View parser order

PARSER_PREFERENCE.first #=> ["ox", :ox]

Returns:

  • (Array<Array>)

    pairs of [require_path, parser_symbol]

if RUBY_ENGINE == "truffleruby"
  [
    ["ox", :ox],
    ["rexml/document", :rexml],
    ["libxml-ruby", :libxml],
    ["oga", :oga],
    ["nokogiri", :nokogiri]
  ].freeze
else
  [
    ["ox", :ox],
    ["libxml-ruby", :libxml],
    ["nokogiri", :nokogiri],
    ["oga", :oga],
    ["rexml/document", :rexml]
  ].freeze
end
PARSE_DATETIME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Parses datetime strings, trying Time first then DateTime

Returns:

  • (Proc)

    lambda that parses datetime strings

lambda do |string|
  Time.parse(string).utc
rescue ArgumentError
  begin
    DateTime.parse(string).to_time.utc
  rescue ArgumentError, NoMethodError
    MultiXML.send(:parse_iso_week_datetime, string)
  end
end
FILE_CONVERTER =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Creates a file-like StringIO from base64-encoded content

Returns:

  • (Proc)

    lambda that creates file objects

lambda do |content, entity|
  StringIO.new(content.unpack1("m")).tap do |io|
    io.extend(FileLike)
    file_io = io # : FileIO
    file_io.original_filename = entity["name"]
    file_io.content_type = entity["content_type"]
  end
end
TYPE_CONVERTERS =

Type converters for XML type attributes

Maps type attribute values to lambdas that convert string content. Converters with arity 2 receive the content and the full entity hash.

Examples:

Using a converter

TYPE_CONVERTERS["integer"].call("42") #=> 42

Returns:

  • (Hash{String => Proc})

    mapping of type names to converter procs

{
  "symbol" => ->(s) { s.to_sym },
  "string" => :to_s.to_proc,
  "integer" => :to_i.to_proc,
  "float" => :to_f.to_proc,
  "double" => :to_f.to_proc,
  "decimal" => ->(s) { BigDecimal(s) },
  "boolean" => ->(s) { !FALSE_BOOLEAN_VALUES.include?(s.strip) },
  "date" => Date.method(:parse),
  "datetime" => PARSE_DATETIME,
  "dateTime" => PARSE_DATETIME,
  "base64Binary" => ->(s) { s.unpack1("m") },
  "binary" => ->(s, entity) { (entity["encoding"] == "base64") ? s.unpack1("m") : s },
  "file" => FILE_CONVERTER,
  "yaml" => lambda do |string|
    YAML.safe_load(string, permitted_classes: [Symbol, Date, Time])
  rescue ArgumentError, Psych::SyntaxError
    string
  end
}.freeze

Constants included from Options

Options::EMPTY_OPTIONS

Class Method Summary collapse

Methods included from Helpers

apply_converter, convert_hash, convert_text_content, disallowed_type?, empty_value?, extract_array_entries, find_array_entries, symbolize_keys, transform_keys, typecast_array, typecast_children, typecast_hash, typecast_xml_value, undasherize_keys, unwrap_file_if_present, unwrap_if_simple, wrap_and_typecast

Methods included from Options

parse_options, parse_options=

Class Method Details

.parse(xml, options = {}) ⇒ Hash

Parse XML into a Ruby Hash

Examples:

Parse simple XML

MultiXML.parse('<root><name>John</name></root>')
#=> {"root"=>{"name"=>"John"}}

Parse with symbolized names

MultiXML.parse('<root><name>John</name></root>', symbolize_names: true)
#=> {root: {name: "John"}}

Parameters:

  • xml (String, IO)

    XML content as a string or IO-like object

  • options (Hash) (defaults to: {})

    Parsing options

Options Hash (options):

  • :parser (Symbol, String, Module)

    Parser to use for this call

  • :symbolize_names (Boolean)

    Convert keys to symbols (default: false)

  • :disallowed_types (Array<String>)

    Types to reject (default: ['yaml', 'symbol'])

  • :typecast_xml_value (Boolean)

    Apply type conversions (default: true)

  • :namespaces (Symbol)

    Namespace handling mode (:strip or :preserve)

Returns:

  • (Hash)

    Parsed XML as nested hash

Raises:



119
120
121
122
123
124
125
126
127
128
129
# File 'lib/multi_xml.rb', line 119

def parse(xml, options = {})
  call_site = OptionsNormalization.normalize_symbolize_option(options)
  global = OptionsNormalization.normalize_symbolize_option(parse_options(call_site))
  options = DEFAULT_OPTIONS.merge(global, call_site)
  namespaces = validate_namespaces_mode(options.fetch(:namespaces))
  io = normalize_input(xml)
  return {} if io.eof?

  result = parse_with_error_handling(io, xml, resolve_parse_parser(options), namespaces)
  apply_postprocessing(result, options)
end

.parserModule

Get the current XML parser module

Returns the currently configured parser, auto-detecting one if not set. Parsers are checked in order of performance: Ox, LibXML, Nokogiri, Oga, REXML.

Honors a fiber-local override set by with_parser so concurrent blocks observe their own parser without clobbering the process-wide default. Falls back to the process default when no override is set.

Examples:

Get current parser

MultiXML.parser #=> MultiXML::Parsers::Ox

Returns:

  • (Module)

    the current parser module



77
78
79
80
81
82
# File 'lib/multi_xml.rb', line 77

def parser
  override = Fiber[:multi_xml_parser]
  return override if override

  @parser ||= resolve_parser(detect_parser)
end

.parser=(new_parser) ⇒ Module

Set the XML parser to use

Examples:

Set parser by symbol

MultiXML.parser = :nokogiri

Set parser by module

MultiXML.parser = MyCustomParser

Parameters:

  • new_parser (Symbol, String, Module)

    Parser specification

    • Symbol/String: :libxml, :nokogiri, :ox, :rexml, :oga
    • Module: Custom parser implementing parse(io) or parse(io, namespaces: ...) and parse_error

Returns:

  • (Module)

    the newly configured parser module



96
97
98
# File 'lib/multi_xml.rb', line 96

def parser=(new_parser)
  @parser = resolve_parser(new_parser)
end

.warn_deprecation_once(key, message) ⇒ void

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

This method returns an undefined value.

Emit a deprecation warning at most once per process for the given key

The warning is tagged with the :deprecated category so callers can silence the whole set with Warning[:deprecated] = false or surface it via ruby -W:deprecated — the standard Ruby idiom for library deprecations since 2.7.

Parameters:

  • key (Symbol)

    identifier for the deprecation (typically the method name)

  • message (String)

    warning message to emit on first call



50
51
52
53
54
55
56
57
# File 'lib/multi_xml.rb', line 50

def self.warn_deprecation_once(key, message)
  Concurrency.synchronize(:deprecation_warnings) do
    return if DEPRECATION_WARNINGS_SHOWN.include?(key)

    Kernel.warn(message, category: :deprecated)
    DEPRECATION_WARNINGS_SHOWN.add(key)
  end
end

.with_parser(new_parser) { ... } ⇒ Object

Execute a block with a temporarily-swapped parser

The override is stored in fiber-local storage so concurrent fibers and threads each see their own parser without racing on a shared module variable; nested calls save and restore the previous fiber-local value. Matches MultiJSON.with_adapter.

Examples:

MultiXML.with_parser(:rexml) { MultiXML.parse("<a>1</a>") }

Parameters:

  • new_parser (Symbol, String, Module)

    parser to use

Yields:

  • block to execute with the temporary parser

Returns:

  • (Object)

    result of the block



145
146
147
148
149
150
151
# File 'lib/multi_xml.rb', line 145

def self.with_parser(new_parser)
  previous_override = Fiber[:multi_xml_parser]
  Fiber[:multi_xml_parser] = resolve_parser(new_parser)
  yield
ensure
  Fiber[:multi_xml_parser] = previous_override
end