Class: Html2rss::HtmlExtractor::Extractors::Archive
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::Extractors::Archive
- Defined in:
- lib/html2rss/html_extractor/enclosure_extractor.rb
Overview
Extracts archive enclosures (zip, tar.gz, tgz) from HTML tags.
Class Method Summary collapse
-
.call(article_tag, base_url:) ⇒ Array<Hash{Symbol => Object}>
Archive enclosure hashes.
Class Method Details
.call(article_tag, base_url:) ⇒ Array<Hash{Symbol => Object}>
Returns archive enclosure hashes.
105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/html2rss/html_extractor/enclosure_extractor.rb', line 105 def self.call(article_tag, base_url:) article_tag.css('a[href$=".zip"], a[href$=".tar.gz"], a[href$=".tgz"]').filter_map do |link| href = link['href'].to_s next if href.empty? abs_url = Url.from_relative(href, base_url) { url: abs_url, type: 'application/zip' } end end |