Class: Archaeo::AssetExtractor
- Inherits:
-
Object
- Object
- Archaeo::AssetExtractor
- Defined in:
- lib/archaeo/asset_extractor.rb
Overview
Extracts resource URLs from archived HTML content using Nokogiri.
Parses the HTML DOM to find CSS, JavaScript, images, fonts, and media resources referenced by the page. Optionally resolves relative URLs against a base URL.
Instance Method Summary collapse
- #extract ⇒ Object
-
#initialize(html, base_url: nil) ⇒ AssetExtractor
constructor
A new instance of AssetExtractor.
Constructor Details
#initialize(html, base_url: nil) ⇒ AssetExtractor
Returns a new instance of AssetExtractor.
13 14 15 16 |
# File 'lib/archaeo/asset_extractor.rb', line 13 def initialize(html, base_url: nil) @doc = Nokogiri::HTML(html.to_s) @base_url = base_url end |