Module: Scrapetor::Fingerprint
- Defined in:
- lib/scrapetor/fingerprint.rb
Overview
Structural fingerprint of a DOM subtree. Phase 1: tag-bigram rolling hash over the top ‘depth` levels. Phase 2+: tag bigrams + attribute-presence hash + child-shape hash.
Constant Summary collapse
- MASK =
0xFFFFFFFFFFFFFFFF
Class Method Summary collapse
Class Method Details
.structural(node, depth: 4) ⇒ Object
10 11 12 13 14 15 16 17 |
# File 'lib/scrapetor/fingerprint.rb', line 10 def self.structural(node, depth: 4) backing = node.respond_to?(:backing_node) ? node.backing_node : node h = 0 walk(backing, depth) do |tag| h = (h * 1_315_423_911 + tag.hash) & MASK end h end |
.walk(nlx, depth, &block) ⇒ Object
19 20 21 22 23 24 25 26 27 |
# File 'lib/scrapetor/fingerprint.rb', line 19 def self.walk(nlx, depth, &block) return if depth <= 0 return unless nlx.respond_to?(:children) nlx.children.each do |c| next unless c.respond_to?(:element?) && c.element? block.call(c.name) walk(c, depth - 1, &block) end end |