Class: Archaeo::ParallelCdx
- Inherits:
-
Object
- Object
- Archaeo::ParallelCdx
- Defined in:
- lib/archaeo/parallel_cdx.rb
Overview
Fetches CDX pages in parallel for faster bulk queries.
Wraps CdxApi and uses a thread pool to fetch multiple CDX result pages simultaneously, then merges results in order.
Constant Summary collapse
- DEFAULT_CONCURRENCY =
4
Instance Method Summary collapse
-
#initialize(cdx_api: CdxApi.new, concurrency: DEFAULT_CONCURRENCY) ⇒ ParallelCdx
constructor
A new instance of ParallelCdx.
- #snapshots(url, **options) ⇒ Object
Constructor Details
#initialize(cdx_api: CdxApi.new, concurrency: DEFAULT_CONCURRENCY) ⇒ ParallelCdx
Returns a new instance of ParallelCdx.
11 12 13 14 |
# File 'lib/archaeo/parallel_cdx.rb', line 11 def initialize(cdx_api: CdxApi.new, concurrency: DEFAULT_CONCURRENCY) @cdx = cdx_api @concurrency = [concurrency.to_i, 1].max end |
Instance Method Details
#snapshots(url, **options) ⇒ Object
16 17 18 19 20 21 |
# File 'lib/archaeo/parallel_cdx.rb', line 16 def snapshots(url, **) pages = @cdx.num_pages(url, **) return @cdx.snapshots(url, **) if pages <= 1 fetch_parallel(url, , pages) end |