Class: Html2rss::HtmlExtractor::ListCandidates
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::ListCandidates
- Defined in:
- lib/html2rss/html_extractor/list_candidates.rb
Overview
Builds repeated-list article container candidates from generic HTML.
Class Method Summary collapse
-
.simplify_xpath(xpath) ⇒ String
Simplify an XPath selector by removing index notation.
Instance Method Summary collapse
- #each_article_tag(anchor_filter:, boundary_condition:) {|article_tag, selected_anchor| ... } ⇒ Enumerator
-
#initialize(parsed_body, minimum_selector_frequency:, use_top_selectors:) ⇒ ListCandidates
constructor
A new instance of ListCandidates.
Constructor Details
#initialize(parsed_body, minimum_selector_frequency:, use_top_selectors:) ⇒ ListCandidates
Returns a new instance of ListCandidates.
20 21 22 23 24 |
# File 'lib/html2rss/html_extractor/list_candidates.rb', line 20 def initialize(parsed_body, minimum_selector_frequency:, use_top_selectors:) @parsed_body = parsed_body @minimum_selector_frequency = minimum_selector_frequency @use_top_selectors = use_top_selectors end |
Class Method Details
.simplify_xpath(xpath) ⇒ String
Simplify an XPath selector by removing index notation.
13 14 15 |
# File 'lib/html2rss/html_extractor/list_candidates.rb', line 13 def self.simplify_xpath(xpath) xpath.gsub(/\[\d+\]/, '') end |
Instance Method Details
#each_article_tag(anchor_filter:, boundary_condition:) {|article_tag, selected_anchor| ... } ⇒ Enumerator
32 33 34 35 36 |
# File 'lib/html2rss/html_extractor/list_candidates.rb', line 32 def each_article_tag(anchor_filter:, boundary_condition:) return enum_for(:each_article_tag, anchor_filter:, boundary_condition:) unless block_given? (anchor_filter:, boundary_condition:).each { yield _1[:article_tag], _1[:selected_anchor] } end |