Class: Canon::Diff::PathBuilder

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/diff/path_builder.rb

Overview

Builds canonical XPath-like paths from TreeNodes or raw nodes Generates paths with ordinal indices to uniquely identify nodes regardless of the parsing library used (Nokogiri, Moxml, Canon, etc.)

This is library-agnostic because it operates on different node types:

  • TreeNodes (from semantic diff adapters) - uses ‘label` attribute

  • Canon::Xml::Node (from DOM diff) - uses ‘name` attribute

  • Nokogiri nodes (from HTML DOM diff) - uses ‘name` method

Examples:

Build path for a TreeNode

path = PathBuilder.build(tree_node)
# => "/#document-fragment/div[0]/p[1]/span[2]"

Build path for a Canon::Xml::Node

path = PathBuilder.build(canon_node)
# => "/#document/root[0]/body[0]/p[1]"

Build path for a Nokogiri node

path = PathBuilder.build(nokogiri_node)
# => "/#document/div[0]/p[1]/span[2]"

Class Method Summary collapse

Class Method Details

.build(node, format: :fragment) ⇒ String

Build canonical path from a node (TreeNode, Canon::Xml::Node, or Nokogiri)

Parameters:

  • node (Object)

    Node to build path for

  • format (Symbol) (defaults to: :fragment)

    Format (:document or :fragment)

Returns:

  • (String)

    Canonical path with ordinal indices



31
32
33
34
35
36
37
38
39
# File 'lib/canon/diff/path_builder.rb', line 31

def self.build(node, format: :fragment)
  return "" if node.nil?

  # Build path segments from root to node
  segments = build_segments(node)

  # Join segments with /
  "/#{segments.join('/')}"
end

.build_segments(tree_node) ⇒ Array<String>

Build path segments (node names with ordinal indices) Traverses from node up to root, then reverses Handles both TreeNodes and raw nodes (Canon::Xml::Node, Nokogiri)

Parameters:

  • tree_node (Object)

    Node to build segments for

Returns:

  • (Array<String>)

    Path segments from root to node



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/canon/diff/path_builder.rb', line 47

def self.build_segments(tree_node)
  segments = []
  current = tree_node
  max_depth = 1000 # Prevent infinite loops
  depth = 0

  # Traverse up to root
  while current && depth < max_depth
    segments.unshift(segment_for_node(current))

    # Move to parent if available
    break unless current.respond_to?(:parent)

    current = current.parent
    depth += 1
  end

  segments
end

.human_path(tree_node) ⇒ String

Build human-readable path description Alternative format that may be more useful for error messages Handles both TreeNodes and raw nodes

Parameters:

  • tree_node (Object)

    Node (TreeNode, Canon::Xml::Node, or Nokogiri)

Returns:

  • (String)

    Human-readable path



151
152
153
154
# File 'lib/canon/diff/path_builder.rb', line 151

def self.human_path(tree_node)
  segments = build_segments(tree_node)
  segments.join("")
end

.ordinal_index(tree_node) ⇒ Integer

Get ordinal index of node among its siblings with the same label Handles both TreeNodes (with Array children) and raw nodes (with NodeSet children)

Parameters:

  • tree_node (Object)

    Node (TreeNode, Canon::Xml::Node, or Nokogiri)

Returns:

  • (Integer)

    Zero-based ordinal index



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/canon/diff/path_builder.rb', line 108

def self.ordinal_index(tree_node)
  # Defensive: return 0 if no parent or doesn't respond to parent
  return 0 unless tree_node.respond_to?(:parent)
  return 0 unless tree_node.parent

  # Check if parent has children
  return 0 unless tree_node.parent.respond_to?(:children)

  siblings = tree_node.parent.children
  return 0 unless siblings

  # Convert to array if it's a NodeSet (Nokogiri) or similar
  siblings = siblings.to_a unless siblings.is_a?(Array)

  # Get the label/name for comparison
  my_label = if tree_node.respond_to?(:label)
               tree_node.label
             elsif tree_node.respond_to?(:name)
               tree_node.name
             end

  return 0 unless my_label

  # Count siblings with same label that appear before this node
  same_label_siblings = siblings.select do |s|
    sibling_label = if s.respond_to?(:label)
                      s.label
                    elsif s.respond_to?(:name)
                      s.name
                    end
    sibling_label == my_label
  end

  # Find position in same-label siblings
  same_label_siblings.index(tree_node) || 0
end

.segment_for_node(tree_node) ⇒ String

Build path segment for a single node Returns label with ordinal index: “div”, “span”, etc. Handles both TreeNodes (with label) and raw nodes (with name)

Parameters:

  • tree_node (Object)

    Node (TreeNode, Canon::Xml::Node, or Nokogiri)

Returns:

  • (String)

    Path segment with ordinal index



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/canon/diff/path_builder.rb', line 73

def self.segment_for_node(tree_node)
  # Handle both TreeNodes (with label) and raw nodes (with name)
  label = if tree_node.respond_to?(:label)
            tree_node.label
          elsif tree_node.respond_to?(:name)
            tree_node.name
          else
            "unknown"
          end

  # Get ordinal index (position among siblings with same label)
  index = ordinal_index(tree_node)

  # For text nodes, use parent element name for clarity
  # e.g., instead of "/p/#text[0]" use "/p/text()[0]"
  if ["text",
      "#text"].include?(label) && tree_node.respond_to?(:parent) && tree_node.parent
    parent_name = if tree_node.parent.respond_to?(:label)
                    tree_node.parent.label
                  elsif tree_node.parent.respond_to?(:name)
                    tree_node.parent.name
                  end
    if parent_name && parent_name != "#document" && parent_name != "#document-fragment"
      return "#{parent_name}/text()[#{index}]"
    end
  end

  "#{label}[#{index}]"
end