Class: Html2rss::HtmlExtractor::SemanticContainers
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::SemanticContainers
- Defined in:
- lib/html2rss/html_extractor/semantic_containers.rb
Overview
Collects semantic content containers from a parsed HTML document.
Constant Summary collapse
- SELECTORS =
Candidate selectors used to locate extractable semantic content blocks.
[ 'article:not(:has(article))', 'section:not(:has(section))', 'li:not(:has(li))', 'tr:not(:has(tr))', 'div:not(:has(div))' ].freeze
Class Method Summary collapse
-
.call(parsed_body) ⇒ Array<Nokogiri::XML::Node>
Candidate semantic containers.
Instance Method Summary collapse
-
#call ⇒ Array<Nokogiri::XML::Node>
Candidate semantic containers.
-
#initialize(parsed_body) ⇒ SemanticContainers
constructor
A new instance of SemanticContainers.
Constructor Details
#initialize(parsed_body) ⇒ SemanticContainers
Returns a new instance of SemanticContainers.
24 25 26 |
# File 'lib/html2rss/html_extractor/semantic_containers.rb', line 24 def initialize(parsed_body) @parsed_body = parsed_body end |
Class Method Details
.call(parsed_body) ⇒ Array<Nokogiri::XML::Node>
Returns candidate semantic containers.
19 20 21 |
# File 'lib/html2rss/html_extractor/semantic_containers.rb', line 19 def self.call(parsed_body) new(parsed_body).call end |
Instance Method Details
#call ⇒ Array<Nokogiri::XML::Node>
Returns candidate semantic containers.
29 30 31 32 33 34 35 |
# File 'lib/html2rss/html_extractor/semantic_containers.rb', line 29 def call containers = SELECTORS.each_with_object([]) do |selector, memo| collect_selector_containers(selector, memo) end containers.sort_by { document_order.fetch(_1) } end |