Class: Philiprehberger::CsvKit::Processor

Inherits:
Object
  • Object
show all
Includes:
Callbacks, ErrorHandler
Defined in:
lib/philiprehberger/csv_kit/processor.rb

Overview

Streaming CSV processor with a DSL for transforms, validations, and filtering.

Constant Summary collapse

TYPE_COERCIONS =
{
  integer: ->(v, _opts) { Integer(v) },
  float: ->(v, _opts) { Float(v) },
  string: ->(v, _opts) { v.to_s },
  date: lambda { |v, opts|
    if opts[:format]
      Date.strptime(v, opts[:format])
    else
      Date.parse(v)
    end
  },
  datetime: lambda { |v, opts|
    if opts[:format]
      Time.strptime(v, opts[:format])
    else
      Time.parse(v)
    end
  }
}.freeze

Instance Method Summary collapse

Methods included from Callbacks

#after_each, #rename

Methods included from ErrorHandler

#errors, #max_errors, #on_error

Constructor Details

#initialize(path_or_io, dialect: nil) ⇒ Processor

Returns a new instance of Processor.



30
31
32
33
34
35
36
37
38
39
40
41
42
# File 'lib/philiprehberger/csv_kit/processor.rb', line 30

def initialize(path_or_io, dialect: nil)
  @path_or_io = path_or_io
  @dialect = dialect ? Dialect.new(dialect) : nil
  @transforms = {}
  @validations = {}
  @reject_block = nil
  @each_block = nil
  @header_names = nil
  @skip_count = nil
  @limit_count = nil
  init_error_handler
  init_callbacks
end

Instance Method Details

#each(&block) ⇒ Object

Register a callback for each processed row.



93
94
95
# File 'lib/philiprehberger/csv_kit/processor.rb', line 93

def each(&block)
  @each_block = block
end

#headers(*names) ⇒ Object

Override header names used for symbolized keys.



45
46
47
# File 'lib/philiprehberger/csv_kit/processor.rb', line 45

def headers(*names)
  @header_names = names.map(&:to_sym)
end

#limit(n) ⇒ void

This method returns an undefined value.

Stop after processing N rows.

Parameters:

  • n (Integer)

    maximum rows to collect



83
84
85
# File 'lib/philiprehberger/csv_kit/processor.rb', line 83

def limit(n)
  @limit_count = n
end

#reject(&block) ⇒ Object

Register a reject predicate.



88
89
90
# File 'lib/philiprehberger/csv_kit/processor.rb', line 88

def reject(&block)
  @reject_block = block
end

#runArray<Row>

Execute the processor, streaming row by row.

Returns:

  • (Array<Row>)

    collected rows



100
101
102
103
# File 'lib/philiprehberger/csv_kit/processor.rb', line 100

def run
  @collected_errors = []
  open_csv { |csv| process_rows(csv) }
end

#skip(n) ⇒ void

This method returns an undefined value.

Skip the first N data rows during processing.

Parameters:

  • n (Integer)

    number of rows to skip



75
76
77
# File 'lib/philiprehberger/csv_kit/processor.rb', line 75

def skip(n)
  @skip_count = n
end

#transform(key, &block) ⇒ Object

Register a transform for a specific column.



50
51
52
# File 'lib/philiprehberger/csv_kit/processor.rb', line 50

def transform(key, &block)
  @transforms[key] = block
end

#type(key, type_name, **opts) ⇒ Object

Register a built-in type coercion for a column.

Parameters:

  • key (Symbol)

    column name

  • type_name (Symbol)

    one of :integer, :float, :string, :date, :datetime

  • opts (Hash)

    additional options (e.g. format: ā€˜%Y-%m-%d’)

Raises:

  • (ArgumentError)


59
60
61
62
63
64
# File 'lib/philiprehberger/csv_kit/processor.rb', line 59

def type(key, type_name, **opts)
  coercion = TYPE_COERCIONS[type_name]
  raise ArgumentError, "Unknown type: #{type_name}" unless coercion

  @transforms[key] = ->(v) { coercion.call(v, opts) }
end

#validate(key, &block) ⇒ Object

Register a validation for a specific column.



67
68
69
# File 'lib/philiprehberger/csv_kit/processor.rb', line 67

def validate(key, &block)
  @validations[key] = block
end