Class: Canon::TreeDiff::Adapters::HTMLAdapter

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/tree_diff/adapters/html_adapter.rb

Overview

HTMLAdapter converts Nokogiri HTML documents to TreeNode structures and back, enabling semantic tree diffing on HTML documents.

This adapter:

  • Converts Nokogiri::HTML::Document to TreeNode tree

  • Preserves element names, text content, and attributes

  • Handles HTML-specific elements (script, style, etc.)

  • Maintains document structure for round-trip conversion

Examples:

Convert HTML to TreeNode

html = Nokogiri::HTML("<html><body><p>text</p></body></html>")
adapter = HTMLAdapter.new
tree = adapter.to_tree(html)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(match_options: {}) ⇒ HTMLAdapter

Initialize adapter with match options

Parameters:

  • match_options (Hash) (defaults to: {})

    Match options for text/attribute normalization



28
29
30
# File 'lib/canon/tree_diff/adapters/html_adapter.rb', line 28

def initialize(match_options: {})
  @match_options = match_options
end

Instance Attribute Details

#match_optionsObject (readonly)

Returns the value of attribute match_options.



23
24
25
# File 'lib/canon/tree_diff/adapters/html_adapter.rb', line 23

def match_options
  @match_options
end

Instance Method Details

#from_tree(tree_node, doc = nil) ⇒ Nokogiri::HTML::Document, Nokogiri::XML::Element

Convert TreeNode back to Nokogiri HTML

Parameters:

  • tree_node (Core::TreeNode)

    Root tree node

  • doc (Nokogiri::HTML::Document) (defaults to: nil)

    Optional document to use

Returns:

  • (Nokogiri::HTML::Document, Nokogiri::XML::Element)


70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/canon/tree_diff/adapters/html_adapter.rb', line 70

def from_tree(tree_node, doc = nil)
  doc ||= Nokogiri::HTML::Document.new

  element = build_element(tree_node, doc)

  if doc.root.nil?
    doc.root = element
    doc
  else
    element
  end
end

#to_tree(node) ⇒ Core::TreeNode

Convert Nokogiri HTML document/element or Canon::Xml::Node to TreeNode

Parameters:

  • node (Nokogiri::HTML::Document, Nokogiri::XML::Element, Nokogiri::HTML::DocumentFragment, Canon::Xml::Node)

    HTML node

Returns:



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/canon/tree_diff/adapters/html_adapter.rb', line 36

def to_tree(node)
  # Handle Canon::Xml::Node types first (same as XML adapter)
  case node
  when Canon::Xml::Nodes::RootNode
    return to_tree_from_canon_root(node)
  when Canon::Xml::Nodes::ElementNode
    return to_tree_from_canon_element(node)
  when Canon::Xml::Nodes::TextNode
    return to_tree_from_canon_text(node)
  when Canon::Xml::Nodes::CommentNode
    return to_tree_from_canon_comment(node)
  end

  # Fallback to Nokogiri (legacy support)
  case node
  when Nokogiri::HTML::Document, Nokogiri::HTML4::Document, Nokogiri::HTML5::Document
    # Start from html element or root element
    root = node.at_css("html") || node.root
    root ? to_tree(root) : nil
  when Nokogiri::HTML4::DocumentFragment, Nokogiri::HTML5::DocumentFragment, Nokogiri::XML::DocumentFragment
    # For DocumentFragment, create a wrapper root node and add all fragment children
    convert_fragment(node)
  when Nokogiri::XML::Element
    convert_element(node)
  else
    raise ArgumentError, "Unsupported node type: #{node.class}"
  end
end