Class: Html2rss::HtmlExtractor::EnclosureExtractor
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::EnclosureExtractor
- Defined in:
- lib/html2rss/html_extractor/enclosure_extractor.rb
Overview
Extracts enclosures from HTML tags using various strategies.
Constant Summary collapse
- SELECTOR =
CSS union query covering images, media, PDFs, iframes, and archives.
[ 'img[src]:not([src^="data"])', 'video source[src]', 'audio source[src]', 'audio[src]', 'a[href$=".pdf"]', 'iframe[src]', 'a[href$=".zip"]', 'a[href$=".tar.gz"]', 'a[href$=".tgz"]' ].join(',').freeze
Class Method Summary collapse
-
.call(article_tag, base_url) ⇒ Array<Hash{Symbol => Object}>
Normalized enclosure hashes.
Class Method Details
.call(article_tag, base_url) ⇒ Array<Hash{Symbol => Object}>
Returns normalized enclosure hashes.
24 25 26 27 28 |
# File 'lib/html2rss/html_extractor/enclosure_extractor.rb', line 24 def self.call(article_tag, base_url) article_tag.css(SELECTOR).filter_map do |element| extract_from_element(element, base_url) end end |