Class: Canon::TreeDiff::Core::TreeNode
- Inherits:
-
Object
- Object
- Canon::TreeDiff::Core::TreeNode
- Defined in:
- lib/canon/tree_diff/core/tree_node.rb
Overview
TreeNode represents a node in a semantic tree structure
This is the fundamental data structure for tree-based diffing, supporting both XML and JSON trees in a format-agnostic way.
Key features:
-
Label: Node name/key (e.g., element name, object key)
-
Value: Leaf node content (text, number, boolean, etc.)
-
Children: Ordered list of child nodes
-
Parent: Reference to parent node (nil for root)
-
Attributes: Key-value metadata (e.g., XML attributes)
-
Signature: Computed path-based identifier (XDiff-style)
-
Weight: Subtree size metric (XyDiff-style)
-
XID: External identifier for matching (e.g., XML id attribute)
Instance Attribute Summary collapse
-
#attributes ⇒ Object
Returns the value of attribute attributes.
-
#children ⇒ Object
Returns the value of attribute children.
-
#label ⇒ Object
Returns the value of attribute label.
-
#metadata ⇒ Object
readonly
Returns the value of attribute metadata.
-
#parent ⇒ Object
Returns the value of attribute parent.
-
#signature ⇒ Object
Returns the value of attribute signature.
-
#source_node ⇒ Object
Returns the value of attribute source_node.
-
#value ⇒ Object
Returns the value of attribute value.
-
#weight ⇒ Object
Returns the value of attribute weight.
-
#xid ⇒ Object
Returns the value of attribute xid.
Instance Method Summary collapse
-
#add_child(child, position: nil) ⇒ TreeNode
Add a child node.
-
#ancestors ⇒ Array<TreeNode>
Get all ancestor nodes from parent to root.
-
#attribute_difference(other) ⇒ Float
Calculate attribute difference with another node.
-
#construct_path ⇒ String
Construct path from tree structure.
-
#content_set ⇒ Set<String>
Get content as a set for similarity calculation.
-
#deep_clone ⇒ TreeNode
Deep clone this node and its subtree.
-
#depth ⇒ Integer
Get depth of this node (distance from root).
-
#descendants ⇒ Array<TreeNode>
Get all descendant nodes (depth-first).
-
#element? ⇒ Boolean
Check if this is an element node (has children or attributes).
-
#height ⇒ Integer
Get height of this node (max distance to any leaf).
-
#initialize(label:, value: nil, children: [], parent: nil, attributes: {}, xid: nil, source_node: nil) ⇒ TreeNode
constructor
Initialize a new TreeNode.
-
#inspect ⇒ String
(also: #to_s)
String representation for debugging.
-
#leaf? ⇒ Boolean
Check if this is a leaf node (no children).
-
#left_siblings ⇒ Array<TreeNode>
Get left siblings (siblings before this node).
-
#matches?(other) ⇒ Boolean
Check if two nodes match exactly.
-
#position ⇒ Integer?
Get the position of this node among its siblings.
-
#remove_child(child) ⇒ TreeNode?
Remove a child node.
-
#replace_child(old_child, new_child) ⇒ TreeNode?
Replace a child node with another.
-
#right_siblings ⇒ Array<TreeNode>
Get right siblings (siblings after this node).
-
#root ⇒ TreeNode
Get the root node of this tree.
-
#semantic_distance_to(other) ⇒ Float
Calculate semantic distance to another node.
-
#siblings ⇒ Array<TreeNode>
Get sibling nodes (nodes with same parent).
-
#similarity_to(other) ⇒ Float
Calculate similarity score with another node.
-
#size ⇒ Integer
Get the size of subtree rooted at this node.
-
#text? ⇒ Boolean
Check if this is a text node (leaf with value).
-
#to_h ⇒ Hash
Convert to hash representation.
-
#xpath ⇒ String
Get XPath for this node.
Constructor Details
#initialize(label:, value: nil, children: [], parent: nil, attributes: {}, xid: nil, source_node: nil) ⇒ TreeNode
Initialize a new TreeNode
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 34 def initialize(label:, value: nil, children: [], parent: nil, attributes: {}, xid: nil, source_node: nil) @label = label @value = value @children = children @parent = parent @attributes = attributes @xid = xid @source_node = source_node @metadata = {} # Set this node as parent for all children @children.each { |child| child.parent = self } # Computed lazily @signature = nil @weight = nil end |
Instance Attribute Details
#attributes ⇒ Object
Returns the value of attribute attributes.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def attributes @attributes end |
#children ⇒ Object
Returns the value of attribute children.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def children @children end |
#label ⇒ Object
Returns the value of attribute label.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def label @label end |
#metadata ⇒ Object (readonly)
Returns the value of attribute metadata.
23 24 25 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 23 def @metadata end |
#parent ⇒ Object
Returns the value of attribute parent.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def parent @parent end |
#signature ⇒ Object
Returns the value of attribute signature.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def signature @signature end |
#source_node ⇒ Object
Returns the value of attribute source_node.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def source_node @source_node end |
#value ⇒ Object
Returns the value of attribute value.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def value @value end |
#weight ⇒ Object
Returns the value of attribute weight.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def weight @weight end |
#xid ⇒ Object
Returns the value of attribute xid.
21 22 23 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 21 def xid @xid end |
Instance Method Details
#add_child(child, position: nil) ⇒ TreeNode
Add a child node
178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 178 def add_child(child, position: nil) child.parent = self if position children.insert(position, child) else children << child end # Invalidate cached computations invalidate_cache child end |
#ancestors ⇒ Array<TreeNode>
Get all ancestor nodes from parent to root
86 87 88 89 90 91 92 93 94 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 86 def ancestors result = [] node = parent while node result << node node = node.parent end result end |
#attribute_difference(other) ⇒ Float
Calculate attribute difference with another node
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 330 def attribute_difference(other) keys1 = Set.new(attributes.keys) keys2 = Set.new(other.attributes.keys) all_keys = keys1 | keys2 return 0.0 if all_keys.empty? diff_count = 0 all_keys.each do |key| val1 = attributes[key] val2 = other.attributes[key] diff_count += 1 if val1 != val2 end diff_count.to_f / all_keys.size end |
#construct_path ⇒ String
Construct path from tree structure
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 365 def construct_path segments = [] node = self while node if node.parent # Get position among siblings with same label siblings = node.parent.children.select do |c| c.label == node.label end position = siblings.index(node) + 1 # 1-based indexing for XPath # Always include index for clarity and precision segments.unshift("#{node.label}[#{position}]") else segments.unshift(node.label) end node = node.parent end "/#{segments.join('/')}" end |
#content_set ⇒ Set<String>
Get content as a set for similarity calculation
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 301 def content_set result = Set.new # Add label result << "label:#{label}" if label # Add value result << "value:#{value}" if value # Add attributes (key only, not values) # This ensures nodes differing only in attribute VALUES still get matched # and are then reported as attribute_updates rather than structural differences # NOTE: The value differences are detected separately in detect_changes attributes.each_key do |key| result << "attr:#{key}" end # Add child labels children.each do |child| result << "child:#{child.label}" end result end |
#deep_clone ⇒ TreeNode
Deep clone this node and its subtree
392 393 394 395 396 397 398 399 400 401 402 403 404 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 392 def deep_clone cloned_children = children.map(&:deep_clone) TreeNode.new( label: label, value: value, children: cloned_children, parent: nil, attributes: attributes.dup, xid: xid, source_node: source_node, # Preserve source node reference ) end |
#depth ⇒ Integer
Get depth of this node (distance from root)
153 154 155 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 153 def depth ancestors.size end |
#descendants ⇒ Array<TreeNode>
Get all descendant nodes (depth-first)
99 100 101 102 103 104 105 106 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 99 def descendants result = [] children.each do |child| result << child result.concat(child.descendants) end result end |
#element? ⇒ Boolean
Check if this is an element node (has children or attributes)
70 71 72 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 70 def element? !leaf? || !attributes.empty? end |
#height ⇒ Integer
Get height of this node (max distance to any leaf)
160 161 162 163 164 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 160 def height return 0 if leaf? 1 + children.map(&:height).max end |
#inspect ⇒ String Also known as: to_s
String representation for debugging
427 428 429 430 431 432 433 434 435 436 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 427 def inspect attrs = [] attrs << "label=#{label.inspect}" attrs << "value=#{value.inspect}" if value attrs << "xid=#{xid.inspect}" if xid attrs << "children=#{children.size}" unless children.empty? attrs << "attributes=#{attributes.size}" unless attributes.empty? "#<TreeNode #{attrs.join(' ')}>" end |
#leaf? ⇒ Boolean
Check if this is a leaf node (no children)
56 57 58 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 56 def leaf? children.empty? end |
#left_siblings ⇒ Array<TreeNode>
Get left siblings (siblings before this node)
120 121 122 123 124 125 126 127 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 120 def left_siblings return [] unless parent index = parent.children.index(self) return [] unless index parent.children[0...index] end |
#matches?(other) ⇒ Boolean
Check if two nodes match exactly
Exact match requires:
-
Same label
-
Same value (for text nodes)
-
Same attributes (key-value pairs)
-
Same number of children with same labels
236 237 238 239 240 241 242 243 244 245 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 236 def matches?(other) return false unless other.is_a?(TreeNode) return false unless label == other.label return false unless value == other.value return false unless attributes == other.attributes return false unless children.size == other.children.size # Check children have same labels children.map(&:label) == other.children.map(&:label) end |
#position ⇒ Integer?
Get the position of this node among its siblings
144 145 146 147 148 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 144 def position return nil unless parent parent.children.index(self) end |
#remove_child(child) ⇒ TreeNode?
Remove a child node
197 198 199 200 201 202 203 204 205 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 197 def remove_child(child) removed = children.delete(child) removed&.parent = nil # Invalidate cached computations invalidate_cache if removed removed end |
#replace_child(old_child, new_child) ⇒ TreeNode?
Replace a child node with another
212 213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 212 def replace_child(old_child, new_child) index = children.index(old_child) return nil unless index old_child.parent = nil new_child.parent = self children[index] = new_child # Invalidate cached computations invalidate_cache old_child end |
#right_siblings ⇒ Array<TreeNode>
Get right siblings (siblings after this node)
132 133 134 135 136 137 138 139 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 132 def right_siblings return [] unless parent index = parent.children.index(self) return [] unless index parent.children[(index + 1)..] end |
#root ⇒ TreeNode
Get the root node of this tree
77 78 79 80 81 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 77 def root node = self node = node.parent while node.parent node end |
#semantic_distance_to(other) ⇒ Float
Calculate semantic distance to another node
Semantic distance considers:
-
Depth difference (structural distance)
-
Content similarity (inverse)
-
Attribute differences
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 282 def semantic_distance_to(other) return Float::INFINITY unless other.is_a?(TreeNode) # Component 1: Depth difference (structural) depth_diff = (depth - other.depth).abs.to_f # Component 2: Content dissimilarity content_diff = 1.0 - similarity_to(other) # Component 3: Attribute differences attr_diff = attribute_difference(other) # Weighted combination (depth_diff * 0.3) + (content_diff * 0.5) + (attr_diff * 0.2) end |
#siblings ⇒ Array<TreeNode>
Get sibling nodes (nodes with same parent)
111 112 113 114 115 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 111 def siblings return [] unless parent parent.children.reject { |child| child == self } end |
#similarity_to(other) ⇒ Float
Calculate similarity score with another node
Uses Jaccard index on combined content:
-
Label
-
Value
-
Attribute keys and values
-
Child labels
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 257 def similarity_to(other) return 0.0 unless other.is_a?(TreeNode) # Extract comparable elements set1 = content_set set2 = other.content_set # Jaccard index: |intersection| / |union| return 0.0 if set1.empty? && set2.empty? intersection = (set1 & set2).size.to_f union = (set1 | set2).size.to_f intersection / union end |
#size ⇒ Integer
Get the size of subtree rooted at this node
169 170 171 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 169 def size 1 + children.sum(&:size) end |
#text? ⇒ Boolean
Check if this is a text node (leaf with value)
63 64 65 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 63 def text? leaf? && !value.nil? end |
#to_h ⇒ Hash
Convert to hash representation
409 410 411 412 413 414 415 416 417 418 419 420 421 422 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 409 def to_h result = { label: label, value: value, attributes: attributes, xid: xid, children: children.map(&:to_h), } result[:signature] = signature if signature result[:weight] = weight if weight result end |
#xpath ⇒ String
Get XPath for this node
352 353 354 355 356 357 358 359 360 |
# File 'lib/canon/tree_diff/core/tree_node.rb', line 352 def xpath # If we have a source node that supports xpath, use it if @source_node.respond_to?(:path) return @source_node.path end # Otherwise construct path from tree structure construct_path end |