Class: Canon::TreeDiff::Core::TreeNode

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/tree_diff/core/tree_node.rb

Overview

TreeNode represents a node in a semantic tree structure

This is the fundamental data structure for tree-based diffing, supporting both XML and JSON trees in a format-agnostic way.

Key features:

  • Label: Node name/key (e.g., element name, object key)

  • Value: Leaf node content (text, number, boolean, etc.)

  • Children: Ordered list of child nodes

  • Parent: Reference to parent node (nil for root)

  • Attributes: Key-value metadata (e.g., XML attributes)

  • Signature: Computed path-based identifier (XDiff-style)

  • Weight: Subtree size metric (XyDiff-style)

  • XID: External identifier for matching (e.g., XML id attribute)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(label:, value: nil, children: [], parent: nil, attributes: {}, xid: nil, source_node: nil) ⇒ TreeNode

Initialize a new TreeNode

Parameters:

  • label (String)

    Node name/key

  • value (String, Numeric, Boolean, nil) (defaults to: nil)

    Leaf value

  • children (Array<TreeNode>) (defaults to: [])

    Child nodes

  • parent (TreeNode, nil) (defaults to: nil)

    Parent node

  • attributes (Hash) (defaults to: {})

    Node attributes

  • xid (String, nil) (defaults to: nil)

    External identifier

  • source_node (Object, nil) (defaults to: nil)

    Original source node (e.g., Nokogiri node)



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/canon/tree_diff/core/tree_node.rb', line 34

def initialize(label:, value: nil, children: [], parent: nil,
               attributes: {}, xid: nil, source_node: nil)
  @label = label
  @value = value
  @children = children
  @parent = parent
  @attributes = attributes
  @xid = xid
  @source_node = source_node
  @metadata = {}

  # Set this node as parent for all children
  @children.each { |child| child.parent = self }

  # Computed lazily
  @signature = nil
  @weight = nil
end

Instance Attribute Details

#attributesObject

Returns the value of attribute attributes.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def attributes
  @attributes
end

#childrenObject

Returns the value of attribute children.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def children
  @children
end

#labelObject

Returns the value of attribute label.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def label
  @label
end

#metadataObject (readonly)

Returns the value of attribute metadata.



23
24
25
# File 'lib/canon/tree_diff/core/tree_node.rb', line 23

def 
  @metadata
end

#parentObject

Returns the value of attribute parent.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def parent
  @parent
end

#signatureObject

Returns the value of attribute signature.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def signature
  @signature
end

#source_nodeObject

Returns the value of attribute source_node.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def source_node
  @source_node
end

#valueObject

Returns the value of attribute value.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def value
  @value
end

#weightObject

Returns the value of attribute weight.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def weight
  @weight
end

#xidObject

Returns the value of attribute xid.



21
22
23
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21

def xid
  @xid
end

Instance Method Details

#add_child(child, position: nil) ⇒ TreeNode

Add a child node

Parameters:

  • child (TreeNode)

    Child to add

  • position (Integer, nil) (defaults to: nil)

    Optional position to insert at

Returns:



178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/canon/tree_diff/core/tree_node.rb', line 178

def add_child(child, position: nil)
  child.parent = self

  if position
    children.insert(position, child)
  else
    children << child
  end

  # Invalidate cached computations
  invalidate_cache

  child
end

#ancestorsArray<TreeNode>

Get all ancestor nodes from parent to root

Returns:



86
87
88
89
90
91
92
93
94
# File 'lib/canon/tree_diff/core/tree_node.rb', line 86

def ancestors
  result = []
  node = parent
  while node
    result << node
    node = node.parent
  end
  result
end

#attribute_difference(other) ⇒ Float

Calculate attribute difference with another node

Parameters:

  • other (TreeNode)

    Node to compare with

Returns:

  • (Float)

    Difference score 0.0 to 1.0



330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
# File 'lib/canon/tree_diff/core/tree_node.rb', line 330

def attribute_difference(other)
  keys1 = Set.new(attributes.keys)
  keys2 = Set.new(other.attributes.keys)

  all_keys = keys1 | keys2
  return 0.0 if all_keys.empty?

  diff_count = 0

  all_keys.each do |key|
    val1 = attributes[key]
    val2 = other.attributes[key]

    diff_count += 1 if val1 != val2
  end

  diff_count.to_f / all_keys.size
end

#construct_pathString

Construct path from tree structure

Returns:

  • (String)

    Path expression



365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
# File 'lib/canon/tree_diff/core/tree_node.rb', line 365

def construct_path
  segments = []
  node = self

  while node
    if node.parent
      # Get position among siblings with same label
      siblings = node.parent.children.select do |c|
        c.label == node.label
      end
      position = siblings.index(node) + 1 # 1-based indexing for XPath

      # Always include index for clarity and precision
      segments.unshift("#{node.label}[#{position}]")
    else
      segments.unshift(node.label)
    end

    node = node.parent
  end

  "/#{segments.join('/')}"
end

#content_setSet<String>

Get content as a set for similarity calculation

Returns:

  • (Set<String>)


301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
# File 'lib/canon/tree_diff/core/tree_node.rb', line 301

def content_set
  result = Set.new

  # Add label
  result << "label:#{label}" if label

  # Add value
  result << "value:#{value}" if value

  # Add attributes (key only, not values)
  # This ensures nodes differing only in attribute VALUES still get matched
  # and are then reported as attribute_updates rather than structural differences
  # NOTE: The value differences are detected separately in detect_changes
  attributes.each_key do |key|
    result << "attr:#{key}"
  end

  # Add child labels
  children.each do |child|
    result << "child:#{child.label}"
  end

  result
end

#deep_cloneTreeNode

Deep clone this node and its subtree

Returns:



392
393
394
395
396
397
398
399
400
401
402
403
404
# File 'lib/canon/tree_diff/core/tree_node.rb', line 392

def deep_clone
  cloned_children = children.map(&:deep_clone)

  TreeNode.new(
    label: label,
    value: value,
    children: cloned_children,
    parent: nil,
    attributes: attributes.dup,
    xid: xid,
    source_node: source_node, # Preserve source node reference
  )
end

#depthInteger

Get depth of this node (distance from root)

Returns:

  • (Integer)


153
154
155
# File 'lib/canon/tree_diff/core/tree_node.rb', line 153

def depth
  ancestors.size
end

#descendantsArray<TreeNode>

Get all descendant nodes (depth-first)

Returns:



99
100
101
102
103
104
105
106
# File 'lib/canon/tree_diff/core/tree_node.rb', line 99

def descendants
  result = []
  children.each do |child|
    result << child
    result.concat(child.descendants)
  end
  result
end

#element?Boolean

Check if this is an element node (has children or attributes)

Returns:

  • (Boolean)


70
71
72
# File 'lib/canon/tree_diff/core/tree_node.rb', line 70

def element?
  !leaf? || !attributes.empty?
end

#heightInteger

Get height of this node (max distance to any leaf)

Returns:

  • (Integer)


160
161
162
163
164
# File 'lib/canon/tree_diff/core/tree_node.rb', line 160

def height
  return 0 if leaf?

  1 + children.map(&:height).max
end

#inspectString Also known as: to_s

String representation for debugging

Returns:

  • (String)


427
428
429
430
431
432
433
434
435
436
# File 'lib/canon/tree_diff/core/tree_node.rb', line 427

def inspect
  attrs = []
  attrs << "label=#{label.inspect}"
  attrs << "value=#{value.inspect}" if value
  attrs << "xid=#{xid.inspect}" if xid
  attrs << "children=#{children.size}" unless children.empty?
  attrs << "attributes=#{attributes.size}" unless attributes.empty?

  "#<TreeNode #{attrs.join(' ')}>"
end

#leaf?Boolean

Check if this is a leaf node (no children)

Returns:

  • (Boolean)


56
57
58
# File 'lib/canon/tree_diff/core/tree_node.rb', line 56

def leaf?
  children.empty?
end

#left_siblingsArray<TreeNode>

Get left siblings (siblings before this node)

Returns:



120
121
122
123
124
125
126
127
# File 'lib/canon/tree_diff/core/tree_node.rb', line 120

def left_siblings
  return [] unless parent

  index = parent.children.index(self)
  return [] unless index

  parent.children[0...index]
end

#matches?(other) ⇒ Boolean

Check if two nodes match exactly

Exact match requires:

  • Same label

  • Same value (for text nodes)

  • Same attributes (key-value pairs)

  • Same number of children with same labels

Parameters:

  • other (TreeNode)

    Node to compare with

Returns:

  • (Boolean)


236
237
238
239
240
241
242
243
244
245
# File 'lib/canon/tree_diff/core/tree_node.rb', line 236

def matches?(other)
  return false unless other.is_a?(TreeNode)
  return false unless label == other.label
  return false unless value == other.value
  return false unless attributes == other.attributes
  return false unless children.size == other.children.size

  # Check children have same labels
  children.map(&:label) == other.children.map(&:label)
end

#positionInteger?

Get the position of this node among its siblings

Returns:

  • (Integer, nil)

    0-based index, or nil if no parent



144
145
146
147
148
# File 'lib/canon/tree_diff/core/tree_node.rb', line 144

def position
  return nil unless parent

  parent.children.index(self)
end

#remove_child(child) ⇒ TreeNode?

Remove a child node

Parameters:

Returns:

  • (TreeNode, nil)

    The removed child, or nil if not found



197
198
199
200
201
202
203
204
205
# File 'lib/canon/tree_diff/core/tree_node.rb', line 197

def remove_child(child)
  removed = children.delete(child)
  removed&.parent = nil

  # Invalidate cached computations
  invalidate_cache if removed

  removed
end

#replace_child(old_child, new_child) ⇒ TreeNode?

Replace a child node with another

Parameters:

Returns:

  • (TreeNode, nil)

    The replaced child, or nil if not found



212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/canon/tree_diff/core/tree_node.rb', line 212

def replace_child(old_child, new_child)
  index = children.index(old_child)
  return nil unless index

  old_child.parent = nil
  new_child.parent = self
  children[index] = new_child

  # Invalidate cached computations
  invalidate_cache

  old_child
end

#right_siblingsArray<TreeNode>

Get right siblings (siblings after this node)

Returns:



132
133
134
135
136
137
138
139
# File 'lib/canon/tree_diff/core/tree_node.rb', line 132

def right_siblings
  return [] unless parent

  index = parent.children.index(self)
  return [] unless index

  parent.children[(index + 1)..]
end

#rootTreeNode

Get the root node of this tree

Returns:



77
78
79
80
81
# File 'lib/canon/tree_diff/core/tree_node.rb', line 77

def root
  node = self
  node = node.parent while node.parent
  node
end

#semantic_distance_to(other) ⇒ Float

Calculate semantic distance to another node

Semantic distance considers:

  • Depth difference (structural distance)

  • Content similarity (inverse)

  • Attribute differences

Parameters:

  • other (TreeNode)

    Node to compare with

Returns:

  • (Float)

    Distance metric (0 = identical)



282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
# File 'lib/canon/tree_diff/core/tree_node.rb', line 282

def semantic_distance_to(other)
  return Float::INFINITY unless other.is_a?(TreeNode)

  # Component 1: Depth difference (structural)
  depth_diff = (depth - other.depth).abs.to_f

  # Component 2: Content dissimilarity
  content_diff = 1.0 - similarity_to(other)

  # Component 3: Attribute differences
  attr_diff = attribute_difference(other)

  # Weighted combination
  (depth_diff * 0.3) + (content_diff * 0.5) + (attr_diff * 0.2)
end

#siblingsArray<TreeNode>

Get sibling nodes (nodes with same parent)

Returns:



111
112
113
114
115
# File 'lib/canon/tree_diff/core/tree_node.rb', line 111

def siblings
  return [] unless parent

  parent.children.reject { |child| child == self }
end

#similarity_to(other) ⇒ Float

Calculate similarity score with another node

Uses Jaccard index on combined content:

  • Label

  • Value

  • Attribute keys and values

  • Child labels

Parameters:

  • other (TreeNode)

    Node to compare with

Returns:

  • (Float)

    Similarity score 0.0 to 1.0



257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/canon/tree_diff/core/tree_node.rb', line 257

def similarity_to(other)
  return 0.0 unless other.is_a?(TreeNode)

  # Extract comparable elements
  set1 = content_set
  set2 = other.content_set

  # Jaccard index: |intersection| / |union|
  return 0.0 if set1.empty? && set2.empty?

  intersection = (set1 & set2).size.to_f
  union = (set1 | set2).size.to_f

  intersection / union
end

#sizeInteger

Get the size of subtree rooted at this node

Returns:

  • (Integer)


169
170
171
# File 'lib/canon/tree_diff/core/tree_node.rb', line 169

def size
  1 + children.sum(&:size)
end

#text?Boolean

Check if this is a text node (leaf with value)

Returns:

  • (Boolean)


63
64
65
# File 'lib/canon/tree_diff/core/tree_node.rb', line 63

def text?
  leaf? && !value.nil?
end

#to_hHash

Convert to hash representation

Returns:

  • (Hash)


409
410
411
412
413
414
415
416
417
418
419
420
421
422
# File 'lib/canon/tree_diff/core/tree_node.rb', line 409

def to_h
  result = {
    label: label,
    value: value,
    attributes: attributes,
    xid: xid,
    children: children.map(&:to_h),
  }

  result[:signature] = signature if signature
  result[:weight] = weight if weight

  result
end

#xpathString

Get XPath for this node

Returns:

  • (String)

    XPath expression



352
353
354
355
356
357
358
359
360
# File 'lib/canon/tree_diff/core/tree_node.rb', line 352

def xpath
  # If we have a source node that supports xpath, use it
  if @source_node.respond_to?(:path)
    return @source_node.path
  end

  # Otherwise construct path from tree structure
  construct_path
end