Class: Archaeo::BulkDownloader
- Inherits:
-
Object
- Object
- Archaeo::BulkDownloader
- Defined in:
- lib/archaeo/bulk_downloader.rb
Overview
Downloads all archived snapshots of a URL with resume support.
Queries the CDX API for matching snapshots, fetches each page, and saves content to disk. Progress is tracked in a state file for interrupted download recovery.
Instance Method Summary collapse
- #download(url, from: nil, to: nil, resume: false, &block) ⇒ Object
-
#initialize(client: HttpClient.new, output_dir: "archive", cdx_api: nil, concurrency: 1) ⇒ BulkDownloader
constructor
A new instance of BulkDownloader.
Constructor Details
#initialize(client: HttpClient.new, output_dir: "archive", cdx_api: nil, concurrency: 1) ⇒ BulkDownloader
Returns a new instance of BulkDownloader.
12 13 14 15 16 17 18 |
# File 'lib/archaeo/bulk_downloader.rb', line 12 def initialize(client: HttpClient.new, output_dir: "archive", cdx_api: nil, concurrency: 1) @client = client @output_dir = output_dir @cdx_api = cdx_api @concurrency = [1, concurrency.to_i].max end |
Instance Method Details
#download(url, from: nil, to: nil, resume: false, &block) ⇒ Object
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/archaeo/bulk_downloader.rb', line 20 def download(url, from: nil, to: nil, resume: false, &block) url = UrlNormalizer.normalize(url) FileUtils.mkdir_p(@output_dir) state = DownloadState.new(@output_dir) snapshots = fetch_snapshots(url, from: from, to: to) total = snapshots.size progress = block if @concurrency == 1 download_sequential(snapshots, total, state, resume, progress) else download_concurrent(snapshots, total, state, resume, progress) end end |