Class: Html2rss::HtmlExtractor::Extractors::Image

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/html_extractor/enclosure_extractor.rb

Overview

Extracts image enclosures from HTML tags. Finds all image sources and returns them in a format suitable for RSS.

Class Method Summary collapse

Class Method Details

.call(article_tag, base_url:) ⇒ Array<Hash{Symbol => Object}>

Returns image enclosure hashes.

Parameters:

  • article_tag (Nokogiri::XML::Element)

    article container node

  • base_url (String, Html2rss::Url)

    base URL for relative image sources

Returns:

  • (Array<Hash{Symbol => Object}>)

    image enclosure hashes



30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/html2rss/html_extractor/enclosure_extractor.rb', line 30

def self.call(, base_url:)
  .css('img[src]:not([src^="data"])').filter_map do |img|
    src = img['src'].to_s
    next if src.empty?

    abs_url = Url.from_relative(src, base_url)
    {
      url: abs_url,
      type: RssBuilder::Enclosure.guess_content_type_from_url(abs_url, default: 'image/jpeg')
    }
  end
end