Class: Wurk::IterableJob::CsvEnumerator

Inherits:
Object
  • Object
show all
Defined in:
lib/wurk/iterable_job/csv_enumerator.rb

Overview

Cursor-resumable CSV iteration helper for IterableJob#build_enumerator. Byte-for-byte behavior parity with Sidekiq’s ‘Sidekiq::Job::Iterable::CsvEnumerator`: the cursor is the integer row (or batch) index, and resume drops that many rows. Requires the host to have loaded `csv` (we don’t force the dependency).

Spec: docs/target/sidekiq-free.md §6.4; Sidekiq wiki Iteration.

Instance Method Summary collapse

Constructor Details

#initialize(csv) ⇒ CsvEnumerator

Returns a new instance of CsvEnumerator.

Raises:

  • (ArgumentError)


13
14
15
16
17
# File 'lib/wurk/iterable_job/csv_enumerator.rb', line 13

def initialize(csv)
  raise ArgumentError, 'CsvEnumerator.new takes CSV object' unless defined?(::CSV) && csv.instance_of?(::CSV)

  @csv = csv
end

Instance Method Details

#batches(cursor:, batch_size: 100) ⇒ Object

Enumerator of ‘[rows_batch, batch_index]` pairs, skipping the first `cursor` batches.



29
30
31
32
33
34
35
# File 'lib/wurk/iterable_job/csv_enumerator.rb', line 29

def batches(cursor:, batch_size: 100)
  @csv.lazy
      .each_slice(batch_size)
      .with_index
      .drop(cursor || 0)
      .to_enum { (count_of_rows_in_file.to_f / batch_size).ceil }
end

#rows(cursor:) ⇒ Object

Enumerator of ‘[row, index]` pairs, skipping the first `cursor` rows.



20
21
22
23
24
25
# File 'lib/wurk/iterable_job/csv_enumerator.rb', line 20

def rows(cursor:)
  @csv.lazy
      .each_with_index
      .drop(cursor || 0)
      .to_enum { count_of_rows_in_file }
end