Class: Html2rss::HtmlExtractor::Extractors::Pdf
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::Extractors::Pdf
- Defined in:
- lib/html2rss/html_extractor/enclosure_extractor.rb
Overview
Extracts PDF enclosures from HTML tags.
Class Method Summary collapse
-
.call(article_tag, base_url:) ⇒ Array<Hash{Symbol => Object}>
PDF enclosure hashes.
Class Method Details
.call(article_tag, base_url:) ⇒ Array<Hash{Symbol => Object}>
Returns PDF enclosure hashes.
67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/html2rss/html_extractor/enclosure_extractor.rb', line 67 def self.call(article_tag, base_url:) article_tag.css('a[href$=".pdf"]').filter_map do |link| href = link['href'].to_s next if href.empty? abs_url = Url.from_relative(href, base_url) { url: abs_url, type: RssBuilder::Enclosure.guess_content_type_from_url(abs_url) } end end |