Class: Html2rss::HtmlExtractor::ImageExtractor
- Inherits:
-
Object
- Object
- Html2rss::HtmlExtractor::ImageExtractor
- Defined in:
- lib/html2rss/html_extractor/image_extractor.rb
Overview
Image is responsible for extracting image URLs the article_tag.
Class Method Summary collapse
-
.call(article_tag, base_url:) ⇒ Html2rss::Url?
Best candidate image URL.
-
.from_img(article_tag) ⇒ String?
Src attribute from first matching image tag.
-
.from_source(article_tag) ⇒ String?
Extracts the largest image source from the srcset attribute of an img tag or a source tag inside a picture tag.
-
.from_style(article_tag) ⇒ String?
Best style-based background image URL.
Class Method Details
.call(article_tag, base_url:) ⇒ Html2rss::Url?
Returns best candidate image URL.
11 12 13 14 15 16 17 |
# File 'lib/html2rss/html_extractor/image_extractor.rb', line 11 def self.call(article_tag, base_url:) img_src = from_source(article_tag) || from_img(article_tag) || from_style(article_tag) Url.from_relative(img_src, base_url) if img_src end |
.from_img(article_tag) ⇒ String?
Returns src attribute from first matching image tag.
21 22 23 |
# File 'lib/html2rss/html_extractor/image_extractor.rb', line 21 def self.from_img(article_tag) article_tag.at_css('img[src]:not([src^="data"])')&.[]('src') end |
.from_source(article_tag) ⇒ String?
Extracts the largest image source from the srcset attribute of an img tag or a source tag inside a picture tag.
34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/html2rss/html_extractor/image_extractor.rb', line 34 def self.from_source(article_tag) # rubocop:disable Metrics/AbcSize hash = article_tag.css('img[srcset], picture > source[srcset]').flat_map do |source| source['srcset'].to_s.scan(/(\S+)\s+(\d+w|\d+h)[\s,]?/).map do |url, width| next if url.nil? || url.start_with?('data:') width_value = width.to_i.zero? ? 0 : width.scan(/\d+/).first.to_i [width_value, url.strip] end end.compact.to_h hash[hash.keys.max] end |
.from_style(article_tag) ⇒ String?
Returns best style-based background image URL.
50 51 52 53 54 55 |
# File 'lib/html2rss/html_extractor/image_extractor.rb', line 50 def self.from_style(article_tag) article_tag.css('[style*="url"]') .filter_map { |tag| tag['style'][/url\(['"]?(.*?)['"]?\)/, 1] } .reject { |src| src.start_with?('data:') } .max_by(&:size) end |