Module: Canon::DiffFormatter::DiffDetailFormatterHelpers::NodeUtils

Defined in:: lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb

Overview

Node utility methods

Provides helper methods for extracting information from nodes.

Constant Summary collapse

ASCII_WHITESPACE_BYTES = Strip only ASCII whitespace (space, tab, CR, LF) but preserve Unicode whitespace like non-breaking space (u00A0). Ruby’s String#strip removes all Unicode whitespace, which destroys meaningful content like u00A0. Returns: (String) — String with leading/trailing ASCII whitespace removed

[32, 9, 13, 10].freeze

Class Method Summary collapse

.find_all_differing_attributes(node1, node2) ⇒ Array<String>

Find all differing attributes between two nodes.
.format_node_brief(node) ⇒ String

Format node briefly for display.
.get_attribute_names(node) ⇒ Array<String>

Get attribute names from a node.
.get_attribute_names_in_order(node) ⇒ Array<String>

Get attribute names in order from a node.
.get_attribute_value(node, attr_name) ⇒ String^?

Get attribute value from a node.
.get_attributes_hash(node) ⇒ Hash

Get attributes as a hash.
.get_element_name_for_display(node) ⇒ String

Get element name for display.
.get_namespace_uri_for_display(node) ⇒ String

Get namespace URI for display.
.get_node_text(node) ⇒ String

Get text content from a node.
.inside_preserve_element?(node) ⇒ Boolean

Check if node is inside a preserve-whitespace element.
.node_to_display(node, compact: false) ⇒ String

Return the best display string for a node.
.parent_of(node) ⇒ Object^?

Return the parent of a node, or nil, regardless of the node API.
.raw_text_value(node) ⇒ String

Return the raw text content of a text node without stripping whitespace.
.serialize_node_compact(node) ⇒ String

Serialize a node tree as compact XML for display.
.serialize_open_tag(node) ⇒ String

Serialize a node’s open tag only — name + attributes, no children, no closing tag.
.strip_ascii_whitespace(str) ⇒ Object

Class Method Details

.find_all_differing_attributes(node1, node2) ⇒ `Array<String>`

Find all differing attributes between two nodes

Parameters:

node1 (Object) —

First node
node2 (Object) —

Second node

Returns:

(Array<String>) —

Array of attribute names with different values

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 48

def self.find_all_differing_attributes(node1, node2)
  return [] unless node1 && node2

  attrs1 = get_attributes_hash(node1)
  attrs2 = get_attributes_hash(node2)

  all_keys = (attrs1.keys | attrs2.keys)

  all_keys.reject do |key|
    attrs1[key.to_s] == attrs2[key.to_s]
  end
end

.format_node_brief(node) ⇒ `String`

Format node briefly for display

Parameters:

node (Object) —

Node to format

Returns:

(String) —

Brief node description

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 251

def self.format_node_brief(node)
  return "" unless node

  name = get_element_name_for_display(node)
  text = get_node_text(node)

  if text && !text.empty?
    "#{name}(\"#{text}\")"
  else
    name
  end
end

.get_attribute_names(node) ⇒ `Array<String>`

Get attribute names from a node

Parameters:

node (Object) —

Node to extract attributes from

Returns:

(Array<String>) —

Array of attribute names

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 17

def self.get_attribute_names(node)
  return [] unless node

  attrs = if node.respond_to?(:attribute_nodes)
            node.attribute_nodes
          elsif node.respond_to?(:attributes)
            node.attributes
          elsif node.respond_to?(:[]) && node.respond_to?(:each)
            # Hash-like node
            node.keys
          else
            []
          end

  return [] unless attrs

  # Handle different attribute formats
  if attrs.is_a?(Array)
    attrs.map { |attr| attr.respond_to?(:name) ? attr.name : attr.to_s }
  elsif attrs.respond_to?(:keys)
    attrs.keys.map(&:to_s)
  else
    []
  end
end

.get_attribute_names_in_order(node) ⇒ `Array<String>`

Get attribute names in order from a node

Parameters:

node (Object) —

Node to extract from

Returns:

(Array<String>) —

Ordered array of attribute names

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 65

def self.get_attribute_names_in_order(node)
  return [] unless node

  attrs = if node.respond_to?(:attribute_nodes)
            node.attribute_nodes
          elsif node.respond_to?(:attributes)
            node.attributes
          else
            []
          end

  return [] unless attrs

  if attrs.is_a?(Array)
    attrs.map { |attr| attr.respond_to?(:name) ? attr.name : attr.to_s }
  else
    attrs.keys.map(&:to_s)
  end
end

.get_attribute_value(node, attr_name) ⇒ `String`^?

Get attribute value from a node

Parameters:

node (Object) —

Node to extract from
attr_name (String) —

Attribute name

Returns:

(String, nil) —

Attribute value or nil

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 131

def self.get_attribute_value(node, attr_name)
  return nil unless node && attr_name

  if node.respond_to?(:[])
    value = node[attr_name]
    if value.respond_to?(:value)
      value.value
    elsif value.respond_to?(:content)
      value.content
    elsif value.respond_to?(:to_s)
      value.to_s
    else
      value
    end
  elsif node.respond_to?(:get_attribute)
    attr = node.get_attribute(attr_name)
    attr.respond_to?(:value) ? attr.value : attr
  elsif node.respond_to?(:attribute_nodes)
    attribute_node = node.attribute_nodes.find do |attr|
      attr.name == attr_name.to_s
    end
    attribute_node&.value
  end
end

.get_attributes_hash(node) ⇒ `Hash`

Get attributes as a hash

Parameters:

node (Object) —

Node to extract from

Returns:

(Hash) —

Attributes hash

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 89

def self.get_attributes_hash(node)
  return {} unless node

  attrs = if node.respond_to?(:attribute_nodes)
            node.attribute_nodes
          elsif node.respond_to?(:attributes)
            node.attributes
          else
            {}
          end

  return {} unless attrs

  result = {}
  if attrs.is_a?(Array)
    attrs.each do |attr|
      name = attr.respond_to?(:name) ? attr.name : attr.to_s
      value = attr.respond_to?(:value) ? attr.value : attr.to_s
      result[name] = value
    end
  elsif attrs.respond_to?(:each)
    attrs.each do |key, val|
      name = key.to_s
      value = if val.respond_to?(:value)
                val.value
              elsif val.respond_to?(:content)
                val.content
              else
                val.to_s
              end
      result[name] = value
    end
  end

  result
end

.get_element_name_for_display(node) ⇒ `String`

Get element name for display

Parameters:

node (Object) —

Node to get name from

Returns:

(String) —

Element name

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 210

def self.get_element_name_for_display(node)
  return "" unless node

  # Handle TextNode specially since it doesn't respond to :name
  if node.is_a?(Canon::Xml::Nodes::TextNode)
    return "text"
  end

  # Handle CommentNode specially since it doesn't respond to :name
  if node.is_a?(Canon::Xml::Nodes::CommentNode)
    return "comment"
  end

  if node.respond_to?(:name)
    node.name.to_s
  else
    node.class.name
  end
end

.get_namespace_uri_for_display(node) ⇒ `String`

Get namespace URI for display

Parameters:

node (Object) —

Node to get namespace from

Returns:

(String) —

Namespace URI

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 234

def self.get_namespace_uri_for_display(node)
  return "" unless node

  if node.respond_to?(:namespace_uri)
    node.namespace_uri.to_s
  elsif node.respond_to?(:namespace)
    ns = node.namespace
    ns.respond_to?(:href) ? ns.href.to_s : ""
  else
    ""
  end
end

.get_node_text(node) ⇒ `String`

Get text content from a node

Parameters:

node (Object) —

Node to extract from

Returns:

(String) —

Text content

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 160

def self.get_node_text(node)
  return "" unless node

  text = if node.respond_to?(:text)
           node.text
         elsif node.respond_to?(:content)
           node.content
         elsif node.respond_to?(:inner_text)
           node.inner_text
         elsif node.respond_to?(:value)
           node.value
         elsif node.respond_to?(:node_info)
           node.node_info
         elsif node.respond_to?(:to_s)
           node.to_s
         else
           ""
         end

  strip_ascii_whitespace(text.to_s)
end

.inside_preserve_element?(node) ⇒ `Boolean`

Check if node is inside a preserve-whitespace element

Parameters:

node (Object) —

Node to check

Returns:

(Boolean) —

true if inside preserve element

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 413

def self.inside_preserve_element?(node)
  return false unless node

  preserve_elements = %w[pre code textarea script style]

  # Check the node itself
  if node.respond_to?(:name) && preserve_elements.include?(node.name.to_s.downcase)
    return true
  end

  # Check ancestors
  current = node
  while current
    if current.respond_to?(:parent)
      current = current.parent
    elsif current.respond_to?(:parent_node)
      current = current.parent_node
    else
      break
    end

    next unless current

    if current.respond_to?(:name) && preserve_elements.include?(current.name.to_s.downcase)
      return true
    end
  end

  false
end

.node_to_display(node, compact: false) ⇒ `String`

Return the best display string for a node.

When compact: true and the node is a Canon ElementNode, returns a compact XML serialization (e.g. <strong>Annex</strong>) instead of the node_info description string that get_node_text would produce. In all other cases, delegates to get_node_text.

Parameters:

node (Object) —

Node to display
compact (Boolean) (defaults to: false) —

Whether to use compact XML for element nodes

Returns:

(String) —

Display string

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 384

def self.node_to_display(node, compact: false)
  if compact && node.is_a?(Canon::Xml::Nodes::ElementNode)
    serialize_node_compact(node)
  else
    get_node_text(node)
  end
end

.parent_of(node) ⇒ `Object`^?

Return the parent of a node, or nil, regardless of the node API.

Canon::Xml nodes expose parent; some Nokogiri-shaped nodes expose parent_node. This helper abstracts over both.

Parameters:

node (Object) —

Node to query

Returns:

(Object, nil) —

Parent node or nil

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 399

def self.parent_of(node)
  return nil unless node

  if node.respond_to?(:parent)
    node.parent
  elsif node.respond_to?(:parent_node)
    node.parent_node
  end
end

.raw_text_value(node) ⇒ `String`

Return the raw text content of a text node without stripping whitespace. get_node_text strips ASCII whitespace, which destroys whitespace-only payloads that callers (e.g. one-sided text-content diff rendering) need to display verbatim.

Parameters:

node (Object) —

Text node

Returns:

(String) —

Raw text content, or “” if not a text-bearing node

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 361

def self.raw_text_value(node)
  return "" unless node

  case node
  when Canon::Xml::Node
    node.value.to_s
  when Nokogiri::XML::Node
    node.content.to_s
  else
    ""
  end
end

.serialize_node_compact(node) ⇒ `String`

Serialize a node tree as compact XML for display.

Produces a human-readable inline XML string without namespace declarations and without indentation — suitable for use in Semantic Diff Report entries. Handles both Canon::Xml::Nodes types and Nokogiri XML/HTML nodes (the html DOM comparison path uses Nokogiri nodes, so element-structure diffs originating there must be rendered structurally too — see issue #120). For any other node type, falls back to get_node_text.

Parameters:

node (Object) —

Node to serialize

Returns:

(String) —

Compact XML string

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 276

def self.serialize_node_compact(node)
  require "cgi"
  return "" unless node

  case node
  when Canon::Xml::Nodes::TextNode
    CGI.escapeHTML(node.value.to_s)
  when Canon::Xml::Nodes::ElementNode
    tag = node.name.to_s
    attrs = node.attribute_nodes.map do |attr|
      attr_name  = attr.respond_to?(:name)  ? attr.name.to_s  : attr.to_s
      attr_value = attr.respond_to?(:value) ? attr.value.to_s : ""
      " #{attr_name}=\"#{CGI.escapeHTML(attr_value)}\""
    end.join
    children_xml = node.children.map do |c|
      serialize_node_compact(c)
    end.join
    if children_xml.empty?
      "<#{tag}#{attrs}/>"
    else
      "<#{tag}#{attrs}>#{children_xml}</#{tag}>"
    end
  when Canon::Xml::Nodes::CommentNode
    text = node.respond_to?(:value) ? node.value.to_s : ""
    "<!--#{CGI.escapeHTML(text)}-->"
  when Nokogiri::XML::Text, Nokogiri::XML::CDATA
    CGI.escapeHTML(node.content.to_s)
  when Nokogiri::XML::Comment
    "<!--#{CGI.escapeHTML(node.content.to_s)}-->"
  when Nokogiri::XML::Element
    tag = node.name.to_s
    attrs = node.attribute_nodes.map do |a|
      " #{a.name}=\"#{CGI.escapeHTML(a.value.to_s)}\""
    end.join
    children_xml = node.children.map do |c|
      serialize_node_compact(c)
    end.join
    if children_xml.empty?
      "<#{tag}#{attrs}/>"
    else
      "<#{tag}#{attrs}>#{children_xml}</#{tag}>"
    end
  else
    # Unknown node types — fall back to text extraction
    get_node_text(node)
  end
end

.serialize_open_tag(node) ⇒ `String`

Serialize a node’s open tag only — name + attributes, no children, no closing tag. Used by format_text_content_one_sided to render a brief parent-element context hint (e.g. <div id=“A”>) for a one-sided text diff, instead of the full ancestor subtree that serialize_node_compact would produce. See lutaml/canon#125.

Parameters:

node (Object) —

Element node to serialize

Returns:

(String) —

Open-tag string, or “” for non-elements / nil

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 332

def self.serialize_open_tag(node)
  require "cgi"
  return "" unless node

  case node
  when Canon::Xml::Nodes::ElementNode
    tag = node.name.to_s
    attrs = node.attribute_nodes.map do |attr|
      " #{attr.name}=\"#{CGI.escapeHTML(attr.value.to_s)}\""
    end.join
    "<#{tag}#{attrs}>"
  when Nokogiri::XML::Element
    tag = node.name.to_s
    attrs = node.attribute_nodes.map do |a|
      " #{a.name}=\"#{CGI.escapeHTML(a.value.to_s)}\""
    end.join
    "<#{tag}#{attrs}>"
  else
    ""
  end
end

.strip_ascii_whitespace(str) ⇒ `Object`

# File 'lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb', line 190

def self.strip_ascii_whitespace(str)
  return "" if str.nil?
  return str if str.empty?

  # Find first non-ASCII-whitespace character position
  first_pos = str.index(/[^ \t\r\n]/)
  return "" unless first_pos

  # Find last non-ASCII-whitespace character position (from end)
  # Use reverse and index, then convert back to forward position
  reversed_pos = str.reverse.index(/[^ \t\r\n]/)
  last_pos = str.length - 1 - reversed_pos

  str[first_pos..last_pos]
end

Module: Canon::DiffFormatter::DiffDetailFormatterHelpers::NodeUtils

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.find_all_differing_attributes(node1, node2) ⇒ Array<String>

.format_node_brief(node) ⇒ String

.get_attribute_names(node) ⇒ Array<String>

.get_attribute_names_in_order(node) ⇒ Array<String>

.get_attribute_value(node, attr_name) ⇒ String?

.get_attributes_hash(node) ⇒ Hash

.get_element_name_for_display(node) ⇒ String

.get_namespace_uri_for_display(node) ⇒ String

.get_node_text(node) ⇒ String

.inside_preserve_element?(node) ⇒ Boolean

.node_to_display(node, compact: false) ⇒ String

.parent_of(node) ⇒ Object?

.raw_text_value(node) ⇒ String

.serialize_node_compact(node) ⇒ String

.serialize_open_tag(node) ⇒ String

.strip_ascii_whitespace(str) ⇒ Object

.find_all_differing_attributes(node1, node2) ⇒ `Array<String>`

.format_node_brief(node) ⇒ `String`

.get_attribute_names(node) ⇒ `Array<String>`

.get_attribute_names_in_order(node) ⇒ `Array<String>`

.get_attribute_value(node, attr_name) ⇒ `String`^?

.get_attributes_hash(node) ⇒ `Hash`

.get_element_name_for_display(node) ⇒ `String`

.get_namespace_uri_for_display(node) ⇒ `String`

.get_node_text(node) ⇒ `String`

.inside_preserve_element?(node) ⇒ `Boolean`

.node_to_display(node, compact: false) ⇒ `String`

.parent_of(node) ⇒ `Object`^?

.raw_text_value(node) ⇒ `String`

.serialize_node_compact(node) ⇒ `String`

.serialize_open_tag(node) ⇒ `String`

.strip_ascii_whitespace(str) ⇒ `Object`