Class: Html2rss::HtmlExtractor::ImageExtractor

Inherits:

Object

Object
Html2rss::HtmlExtractor::ImageExtractor

show all

Defined in:: lib/html2rss/html_extractor/image_extractor.rb

Overview

Image is responsible for extracting image URLs the article_tag.

Class Method Summary collapse

.call(article_tag, base_url:) ⇒ Html2rss::Url^?

Best candidate image URL.
.from_img(article_tag) ⇒ String^?

Src attribute from first matching image tag.
.from_source(article_tag) ⇒ String^?

Extracts the largest image source from the srcset attribute of an img tag or a source tag inside a picture tag.
.from_style(article_tag) ⇒ String^?

Best style-based background image URL.

Class Method Details

.call(article_tag, base_url:) ⇒ `Html2rss::Url`^?

Returns best candidate image URL.

Parameters:

article_tag (Nokogiri::XML::Element) —

article container node
base_url (String, Html2rss::Url) —

base URL for relative image URLs

Returns:

(Html2rss::Url, nil) —

best candidate image URL

# File 'lib/html2rss/html_extractor/image_extractor.rb', line 11

def self.call(article_tag, base_url:)
  img_src = from_source(article_tag) ||
            from_img(article_tag) ||
            from_style(article_tag)

  Url.from_relative(img_src, base_url) if img_src
end

.from_img(article_tag) ⇒ `String`^?

Returns src attribute from first matching image tag.

Parameters:

article_tag (Nokogiri::XML::Element) —

article container node

Returns:

(String, nil) —

src attribute from first matching image tag



21
22
23

# File 'lib/html2rss/html_extractor/image_extractor.rb', line 21

def self.from_img(article_tag)
  article_tag.at_css('img[src]:not([src^="data"])')&.[]('src')
end

.from_source(article_tag) ⇒ `String`^?

Extracts the largest image source from the srcset attribute of an img tag or a source tag inside a picture tag.

Parameters:

article_tag (Nokogiri::XML::Element) —

article container node

Returns:

(String, nil) —

largest srcset URL candidate

.from_style(article_tag) ⇒ `String`^?

Returns best style-based background image URL.

Parameters:

article_tag (Nokogiri::XML::Element) —

article container node

Returns:

(String, nil) —

best style-based background image URL

# File 'lib/html2rss/html_extractor/image_extractor.rb', line 50

def self.from_style(article_tag)
  article_tag.css('[style*="url"]')
             .filter_map { |tag| tag['style'][/url\(['"]?(.*?)['"]?\)/, 1] }
             .reject { |src| src.start_with?('data:') }
             .max_by(&:size)
end

Class: Html2rss::HtmlExtractor::ImageExtractor

Overview

Class Method Summary collapse

Class Method Details

.call(article_tag, base_url:) ⇒ Html2rss::Url?

.from_img(article_tag) ⇒ String?

.from_source(article_tag) ⇒ String?

.from_style(article_tag) ⇒ String?

.call(article_tag, base_url:) ⇒ `Html2rss::Url`^?

.from_img(article_tag) ⇒ `String`^?

.from_source(article_tag) ⇒ `String`^?

.from_style(article_tag) ⇒ `String`^?