Class: Html2rss::HtmlExtractor::SemanticAnchorCandidates::Context
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::SemanticAnchorCandidates::Context
- Defined in:
- lib/html2rss/html_extractor/semantic_anchor_candidates.rb
Overview
Shared context for all anchors in one semantic container.
Constant Summary collapse
- UTILITY_LANDMARK_TAGS =
Ancestor tags that usually indicate navigation/utility regions.
%w[nav aside footer menu].freeze
Instance Method Summary collapse
-
#destination_facts(anchor) ⇒ Html2rss::AutoSource::Scraper::LinkHeuristics::DestinationFacts?
Destination facts.
-
#heading ⇒ Nokogiri::XML::Node?
Heading used to identify title anchors.
-
#heading_text ⇒ String
Visible heading text.
-
#initialize(container, link_heuristics:) ⇒ Context
constructor
A new instance of Context.
-
#utility_landmark?(ancestors) ⇒ Boolean
True when the anchor lives inside navigation chrome.
-
#utility_text?(text) ⇒ Boolean
True when text is utility chrome.
-
#visible_text(node) ⇒ String
Visible text for the node.
Constructor Details
#initialize(container, link_heuristics:) ⇒ Context
Returns a new instance of Context.
39 40 41 42 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 39 def initialize(container, link_heuristics:) @container = container @link_heuristics = link_heuristics end |
Instance Method Details
#destination_facts(anchor) ⇒ Html2rss::AutoSource::Scraper::LinkHeuristics::DestinationFacts?
Returns destination facts.
64 65 66 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 64 def destination_facts(anchor) @link_heuristics.destination_facts(anchor) end |
#heading ⇒ Nokogiri::XML::Node?
Returns heading used to identify title anchors.
45 46 47 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 45 def heading @heading ||= @container.at_css(HtmlExtractor::HEADING_TAGS.join(',')) end |
#heading_text ⇒ String
Returns visible heading text.
50 51 52 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 50 def heading_text @heading_text ||= visible_text(heading) end |
#utility_landmark?(ancestors) ⇒ Boolean
Returns true when the anchor lives inside navigation chrome.
76 77 78 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 76 def utility_landmark?(ancestors) ancestors.any? { |node| UTILITY_LANDMARK_TAGS.include?(node.name) } end |
#utility_text?(text) ⇒ Boolean
Returns true when text is utility chrome.
70 71 72 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 70 def utility_text?(text) @link_heuristics.utility_text?(text) end |
#visible_text(node) ⇒ String
Returns visible text for the node.
56 57 58 59 60 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 56 def visible_text(node) return '' unless node HtmlExtractor.extract_visible_text(node).to_s.strip end |