Class: Philiprehberger::CsvKit::Processor

Inherits:
Object
  • Object
show all
Includes:
Callbacks, ErrorHandler
Defined in:
lib/philiprehberger/csv_kit/processor.rb

Overview

Streaming CSV processor with a DSL for transforms, validations, and filtering.

Constant Summary collapse

TYPE_COERCIONS =
{
  integer: ->(v, _opts) { Integer(v) },
  float: ->(v, _opts) { Float(v) },
  string: ->(v, _opts) { v.to_s },
  date: lambda { |v, opts|
    if opts[:format]
      Date.strptime(v, opts[:format])
    else
      Date.parse(v)
    end
  },
  datetime: lambda { |v, opts|
    if opts[:format]
      Time.strptime(v, opts[:format])
    else
      Time.parse(v)
    end
  }
}.freeze

Instance Method Summary collapse

Methods included from Callbacks

#after_each, #rename

Methods included from ErrorHandler

#errors, #max_errors, #on_error

Constructor Details

#initialize(path_or_io, dialect: nil) ⇒ Processor

Returns a new instance of Processor.



30
31
32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/philiprehberger/csv_kit/processor.rb', line 30

def initialize(path_or_io, dialect: nil)
  @path_or_io = path_or_io
  @dialect = dialect ? Dialect.new(dialect) : nil
  @transforms = {}
  @defaults = {}
  @validations = {}
  @reject_block = nil
  @each_block = nil
  @header_names = nil
  @skip_count = nil
  @limit_count = nil
  init_error_handler
  init_callbacks
end

Instance Method Details

#default(key, value) ⇒ self

Register a default value for a column.

Cells where the value is ‘nil` or an empty string are replaced with the provided default during transform. Defaults run BEFORE `type` coercions and `transform` blocks, so callers can default a missing cell to a string and then coerce it (e.g. default to “0” then cast to :integer).

Parameters:

  • key (Symbol)

    column name

  • value (Object)

    value to use when the cell is nil or empty

Returns:

  • (self)


78
79
80
81
# File 'lib/philiprehberger/csv_kit/processor.rb', line 78

def default(key, value)
  @defaults[key] = value
  self
end

#each(&block) ⇒ Object

Register a callback for each processed row.



110
111
112
# File 'lib/philiprehberger/csv_kit/processor.rb', line 110

def each(&block)
  @each_block = block
end

#headers(*names) ⇒ Object

Override header names used for symbolized keys.



46
47
48
# File 'lib/philiprehberger/csv_kit/processor.rb', line 46

def headers(*names)
  @header_names = names.map(&:to_sym)
end

#limit(n) ⇒ void

This method returns an undefined value.

Stop after processing N rows.

Parameters:

  • n (Integer)

    maximum rows to collect



100
101
102
# File 'lib/philiprehberger/csv_kit/processor.rb', line 100

def limit(n)
  @limit_count = n
end

#reject(&block) ⇒ Object

Register a reject predicate.



105
106
107
# File 'lib/philiprehberger/csv_kit/processor.rb', line 105

def reject(&block)
  @reject_block = block
end

#runArray<Row>

Execute the processor, streaming row by row.

Returns:

  • (Array<Row>)

    collected rows



117
118
119
120
# File 'lib/philiprehberger/csv_kit/processor.rb', line 117

def run
  @collected_errors = []
  open_csv { |csv| process_rows(csv) }
end

#skip(n) ⇒ void

This method returns an undefined value.

Skip the first N data rows during processing.

Parameters:

  • n (Integer)

    number of rows to skip



92
93
94
# File 'lib/philiprehberger/csv_kit/processor.rb', line 92

def skip(n)
  @skip_count = n
end

#transform(key, &block) ⇒ Object

Register a transform for a specific column.



51
52
53
# File 'lib/philiprehberger/csv_kit/processor.rb', line 51

def transform(key, &block)
  @transforms[key] = block
end

#type(key, type_name, **opts) ⇒ Object

Register a built-in type coercion for a column.

Parameters:

  • key (Symbol)

    column name

  • type_name (Symbol)

    one of :integer, :float, :string, :date, :datetime

  • opts (Hash)

    additional options (e.g. format: ‘%Y-%m-%d’)

Raises:

  • (ArgumentError)


60
61
62
63
64
65
# File 'lib/philiprehberger/csv_kit/processor.rb', line 60

def type(key, type_name, **opts)
  coercion = TYPE_COERCIONS[type_name]
  raise ArgumentError, "Unknown type: #{type_name}" unless coercion

  @transforms[key] = ->(v) { coercion.call(v, opts) }
end

#validate(key, &block) ⇒ Object

Register a validation for a specific column.



84
85
86
# File 'lib/philiprehberger/csv_kit/processor.rb', line 84

def validate(key, &block)
  @validations[key] = block
end