Class: Archaeo::ParallelCdx

Inherits:
Object
  • Object
show all
Defined in:
lib/archaeo/parallel_cdx.rb

Overview

Fetches CDX pages in parallel for faster bulk queries.

Wraps CdxApi and uses a thread pool to fetch multiple CDX result pages simultaneously, then merges results in order.

Constant Summary collapse

DEFAULT_CONCURRENCY =
4

Instance Method Summary collapse

Constructor Details

#initialize(cdx_api: CdxApi.new, concurrency: DEFAULT_CONCURRENCY) ⇒ ParallelCdx

Returns a new instance of ParallelCdx.



11
12
13
14
# File 'lib/archaeo/parallel_cdx.rb', line 11

def initialize(cdx_api: CdxApi.new, concurrency: DEFAULT_CONCURRENCY)
  @cdx = cdx_api
  @concurrency = [concurrency.to_i, 1].max
end

Instance Method Details

#snapshots(url, **options) ⇒ Object



16
17
18
19
20
21
# File 'lib/archaeo/parallel_cdx.rb', line 16

def snapshots(url, **options)
  pages = @cdx.num_pages(url, **options)
  return @cdx.snapshots(url, **options) if pages <= 1

  fetch_parallel(url, options, pages)
end