Class: Archaeo::Fetcher
- Inherits:
-
Object
- Object
- Archaeo::Fetcher
- Defined in:
- lib/archaeo/fetcher.rb
Overview
Downloads archived content from the Wayback Machine.
Constructs the appropriate archive URL, follows redirects, and returns a Page model with content and metadata.
Constant Summary collapse
- MAX_REDIRECTS =
5- BASE =
"https://web.archive.org"
Instance Method Summary collapse
- #fetch(url, timestamp:, identity: false, snapshot: nil) ⇒ Object
- #fetch_page_with_assets(url, timestamp:) ⇒ Object
-
#initialize(client: HttpClient.new) ⇒ Fetcher
constructor
A new instance of Fetcher.
Constructor Details
#initialize(client: HttpClient.new) ⇒ Fetcher
Returns a new instance of Fetcher.
14 15 16 |
# File 'lib/archaeo/fetcher.rb', line 14 def initialize(client: HttpClient.new) @client = client end |
Instance Method Details
#fetch(url, timestamp:, identity: false, snapshot: nil) ⇒ Object
18 19 20 21 22 23 24 25 26 |
# File 'lib/archaeo/fetcher.rb', line 18 def fetch(url, timestamp:, identity: false, snapshot: nil) url = UrlNormalizer.normalize(url) ts = Timestamp.coerce() archive_url = ArchiveUrl.new(url, timestamp: ts, identity: identity) response = follow_redirects(archive_url.to_s) verify_integrity!(response, snapshot) if snapshot build_page(response, archive_url.to_s, url, ts) end |
#fetch_page_with_assets(url, timestamp:) ⇒ Object
28 29 30 31 32 33 |
# File 'lib/archaeo/fetcher.rb', line 28 def fetch_page_with_assets(url, timestamp:) page = fetch(url, timestamp: ) assets = AssetExtractor.new(page.content, base_url: page.archive_url).extract PageBundle.new(page: page, assets: assets) end |