Class: Html2rss::AutoSource::Scraper::SemanticHtml::AnchorSelector
- Inherits:
-
Object
- Object
- Html2rss::AutoSource::Scraper::SemanticHtml::AnchorSelector
- Defined in:
- lib/html2rss/auto_source/scraper/semantic_html/anchor_selector.rb
Overview
Selects the best content-like anchor from a semantic container.
The selector turns raw DOM anchors into ranked facts so semantic scraping can reason about link intent instead of DOM order. It favors heading-aligned article links and suppresses utility links, duplicate destinations, and weak textless affordances.
Constant Summary collapse
- HEADING_SELECTOR =
Comma-separated heading selector used for heading/anchor matching.
HtmlExtractor::HEADING_TAGS.join(',').freeze
Instance Method Summary collapse
-
#initialize(base_url) ⇒ AnchorSelector
constructor
A new instance of AnchorSelector.
-
#primary_anchor_for(container) ⇒ Nokogiri::XML::Element?
Chooses the single anchor that best represents the story contained in a semantic block.
Constructor Details
#initialize(base_url) ⇒ AnchorSelector
Returns a new instance of AnchorSelector.
19 20 21 |
# File 'lib/html2rss/auto_source/scraper/semantic_html/anchor_selector.rb', line 19 def initialize(base_url) @link_heuristics = LinkHeuristics.new(base_url) end |
Instance Method Details
#primary_anchor_for(container) ⇒ Nokogiri::XML::Element?
Chooses the single anchor that best represents the story contained in a semantic block.
Ranking is scoped to one container at a time. That keeps the logic local, makes duplicate links to the same destination collapse into one candidate, and avoids page-wide heuristics leaking across cards.
33 34 35 |
# File 'lib/html2rss/auto_source/scraper/semantic_html/anchor_selector.rb', line 33 def primary_anchor_for(container) facts_for(container).max_by(&:score)&.anchor end |