Class: Archaeo::Fetcher
- Inherits:
-
Object
- Object
- Archaeo::Fetcher
- Defined in:
- lib/archaeo/fetcher.rb
Overview
Downloads archived content from the Wayback Machine.
Constructs the appropriate archive URL, follows redirects, and returns a Page model with content and metadata.
Constant Summary collapse
- MAX_REDIRECTS =
5- BASE =
"https://web.archive.org"
Instance Method Summary collapse
- #fetch(url, timestamp:, identity: false) ⇒ Object
- #fetch_page_with_assets(url, timestamp:) ⇒ Object
-
#initialize(client: HttpClient.new) ⇒ Fetcher
constructor
A new instance of Fetcher.
Constructor Details
#initialize(client: HttpClient.new) ⇒ Fetcher
Returns a new instance of Fetcher.
12 13 14 |
# File 'lib/archaeo/fetcher.rb', line 12 def initialize(client: HttpClient.new) @client = client end |
Instance Method Details
#fetch(url, timestamp:, identity: false) ⇒ Object
16 17 18 19 20 21 22 23 |
# File 'lib/archaeo/fetcher.rb', line 16 def fetch(url, timestamp:, identity: false) url = UrlNormalizer.normalize(url) ts = Timestamp.coerce() archive_url = ArchiveUrl.new(url, timestamp: ts, identity: identity) response = follow_redirects(archive_url.to_s) build_page(response, archive_url.to_s, url, ts) end |
#fetch_page_with_assets(url, timestamp:) ⇒ Object
25 26 27 28 29 30 |
# File 'lib/archaeo/fetcher.rb', line 25 def fetch_page_with_assets(url, timestamp:) page = fetch(url, timestamp: ) assets = AssetExtractor.new(page.content, base_url: page.archive_url).extract PageBundle.new(page: page, assets: assets) end |