Class: Html2rss::HtmlExtractor::EnclosureExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/html_extractor/enclosure_extractor.rb

Overview

Extracts enclosures from HTML tags using various strategies.

Constant Summary collapse

SELECTOR =

CSS union query covering images, media, PDFs, iframes, and archives.

[
  'img[src]:not([src^="data"])',
  'video source[src]',
  'audio source[src]',
  'audio[src]',
  'a[href$=".pdf"]',
  'iframe[src]',
  'a[href$=".zip"]',
  'a[href$=".tar.gz"]',
  'a[href$=".tgz"]'
].join(',').freeze

Class Method Summary collapse

Class Method Details

.call(article_tag, base_url) ⇒ Array<Hash{Symbol => Object}>

Returns normalized enclosure hashes.

Parameters:

  • article_tag (Nokogiri::XML::Element)

    article container node

  • base_url (String, Html2rss::Url)

    base URL for relative enclosure links

Returns:

  • (Array<Hash{Symbol => Object}>)

    normalized enclosure hashes



24
25
26
27
28
# File 'lib/html2rss/html_extractor/enclosure_extractor.rb', line 24

def self.call(, base_url)
  .css(SELECTOR).filter_map do |element|
    extract_from_element(element, base_url)
  end
end