Class: Html2rss::HtmlExtractor::SemanticAnchorCandidates::Context
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::SemanticAnchorCandidates::Context
- Defined in:
- lib/html2rss/html_extractor/semantic_anchor_candidates.rb
Overview
Shared context for all anchors in one semantic container.
Constant Summary collapse
- UTILITY_LANDMARK_TAGS =
Ancestor tags that usually indicate navigation/utility regions.
%w[nav aside footer menu].freeze
Instance Attribute Summary collapse
-
#container ⇒ Object
readonly
Returns the value of attribute container.
Instance Method Summary collapse
-
#destination_facts(anchor) ⇒ Html2rss::AutoSource::Scraper::LinkHeuristics::DestinationFacts?
Destination facts.
-
#heading ⇒ Nokogiri::XML::Node?
Heading used to identify title anchors.
-
#heading_text ⇒ String
Visible heading text.
-
#initialize(container, link_heuristics:) ⇒ Context
constructor
A new instance of Context.
-
#utility_text?(text) ⇒ Boolean
True when text is utility chrome.
-
#visible_text(node) ⇒ String
Visible text for the node.
Constructor Details
#initialize(container, link_heuristics:) ⇒ Context
Returns a new instance of Context.
41 42 43 44 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 41 def initialize(container, link_heuristics:) @container = container @link_heuristics = link_heuristics end |
Instance Attribute Details
#container ⇒ Object (readonly)
Returns the value of attribute container.
34 35 36 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 34 def container @container end |
Instance Method Details
#destination_facts(anchor) ⇒ Html2rss::AutoSource::Scraper::LinkHeuristics::DestinationFacts?
Returns destination facts.
66 67 68 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 66 def destination_facts(anchor) @link_heuristics.destination_facts(anchor) end |
#heading ⇒ Nokogiri::XML::Node?
Returns heading used to identify title anchors.
47 48 49 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 47 def heading @heading ||= @container.at_css(HtmlExtractor::HEADING_TAGS.join(',')) end |
#heading_text ⇒ String
Returns visible heading text.
52 53 54 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 52 def heading_text @heading_text ||= visible_text(heading) end |
#utility_text?(text) ⇒ Boolean
Returns true when text is utility chrome.
72 73 74 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 72 def utility_text?(text) @link_heuristics.utility_text?(text) end |
#visible_text(node) ⇒ String
Returns visible text for the node.
58 59 60 61 62 |
# File 'lib/html2rss/html_extractor/semantic_anchor_candidates.rb', line 58 def visible_text(node) return '' unless node (@visible_texts ||= {}.compare_by_identity)[node] ||= HtmlExtractor.extract_visible_text(node).to_s.strip end |