Module: Philiprehberger::CsvKit

Defined in:
lib/philiprehberger/csv_kit.rb,
lib/philiprehberger/csv_kit/row.rb,
lib/philiprehberger/csv_kit/writer.rb,
lib/philiprehberger/csv_kit/dialect.rb,
lib/philiprehberger/csv_kit/version.rb,
lib/philiprehberger/csv_kit/detector.rb,
lib/philiprehberger/csv_kit/callbacks.rb,
lib/philiprehberger/csv_kit/processor.rb,
lib/philiprehberger/csv_kit/error_handler.rb

Defined Under Namespace

Modules: Callbacks, ErrorHandler Classes: Detector, Dialect, Error, Processor, Row, Writer

Constant Summary collapse

VERSION =
'0.8.0'

Class Method Summary collapse

Class Method Details

.count(path_or_io, dialect: nil) ⇒ Integer

Count data rows without loading them all into memory.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Returns:

  • (Integer)


97
98
99
100
101
# File 'lib/philiprehberger/csv_kit.rb', line 97

def self.count(path_or_io, dialect: nil)
  n = 0
  foreach_row(path_or_io, headers: true, dialect: dialect) { |_| n += 1 }
  n
end

.each_hash(path_or_io, dialect: nil) {|Hash{Symbol => String}| ... } ⇒ Enumerator?

Stream rows one at a time as symbolized hashes with constant memory. Returns an Enumerator if no block is given.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Yields:

  • (Hash{Symbol => String})

    each row

Returns:

  • (Enumerator, nil)


110
111
112
113
114
115
116
117
118
# File 'lib/philiprehberger/csv_kit.rb', line 110

def self.each_hash(path_or_io, dialect: nil, &block)
  enum = Enumerator.new do |yielder|
    foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
      yielder.yield(row.to_h.transform_keys(&:to_sym))
    end
  end

  block ? enum.each(&block) : enum
end

.filter(path_or_io, dialect: nil) {|Hash{Symbol => String}| ... } ⇒ String

Filter rows and return matching rows as a CSV string.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Yields:

  • (Hash{Symbol => String})

    each row as a symbolized hash

Returns:

  • (String)

    CSV string with headers



166
167
168
169
170
171
172
173
174
175
# File 'lib/philiprehberger/csv_kit.rb', line 166

def self.filter(path_or_io, dialect: nil, &)
  rows = to_hashes(path_or_io, dialect: dialect).select(&)
  return '' if rows.empty?

  headers = rows.first.keys
  CSV.generate do |csv|
    csv << headers
    rows.each { |row| csv << headers.map { |k| row[k] } }
  end
end

.find(path_or_io, dialect: nil) {|Hash{Symbol => String}| ... } ⇒ Hash{Symbol => String}?

Find the first row matching a predicate, streaming (stops as soon as a match is found).

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Yields:

  • (Hash{Symbol => String})

    each row as a symbolized hash

Returns:

  • (Hash{Symbol => String}, nil)

    the first matching row or nil



152
153
154
155
156
157
158
# File 'lib/philiprehberger/csv_kit.rb', line 152

def self.find(path_or_io, dialect: nil, &block)
  foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
    hash = row.to_h.transform_keys(&:to_sym)
    return hash if block.call(hash)
  end
  nil
end

.headers(path_or_io, dialect: nil) ⇒ Array<Symbol>

Return the header row as an array of symbols.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Returns:

  • (Array<Symbol>)


80
81
82
83
84
85
86
87
88
89
90
# File 'lib/philiprehberger/csv_kit.rb', line 80

def self.headers(path_or_io, dialect: nil)
  csv_opts = {}
  csv_opts = Dialect.new(dialect).merge_into(csv_opts) if dialect
  row = nil
  with_csv(path_or_io, csv_opts) do |csv|
    row = csv.shift
  end
  return [] unless row

  row.map(&:to_sym)
end

.pluck(path_or_io, *keys, dialect: nil) ⇒ Array<Hash{Symbol => String}>

Extract specific columns from a CSV.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • keys (Array<Symbol>)

    column names to extract

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Returns:

  • (Array<Hash{Symbol => String}>)


71
72
73
# File 'lib/philiprehberger/csv_kit.rb', line 71

def self.pluck(path_or_io, *keys, dialect: nil)
  to_hashes(path_or_io, dialect: dialect).map { |h| h.slice(*keys) }
end

.process(path_or_io, dialect: nil) {|Processor| ... } ⇒ Array<Row>

Streaming DSL — yields a Processor for configuration, then executes.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Yields:

  • (Processor)

    processor to configure transforms and validations

Returns:

  • (Array<Row>)

    collected rows



26
27
28
29
30
# File 'lib/philiprehberger/csv_kit.rb', line 26

def self.process(path_or_io, dialect: nil, &block)
  processor = Processor.new(path_or_io, dialect: dialect)
  block.call(processor)
  processor.run
end

.sample(path_or_io, n, dialect: nil) ⇒ Array<Hash{Symbol => String}>

Return n randomly sampled rows using reservoir sampling (Algorithm R). Memory usage is O(n) regardless of file size. If the file has fewer than n rows, all rows are returned.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • n (Integer)

    number of rows to sample

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Returns:

  • (Array<Hash{Symbol => String}>)


128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
# File 'lib/philiprehberger/csv_kit.rb', line 128

def self.sample(path_or_io, n, dialect: nil)
  reservoir = []
  index = 0

  foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
    hash = row.to_h.transform_keys(&:to_sym)
    if index < n
      reservoir << hash
    else
      j = rand(index + 1)
      reservoir[j] = hash if j < n
    end
    index += 1
  end

  reservoir
end

.to_csv(rows, headers: nil, dialect: nil) ⇒ String

Serialize an array of hashes to a CSV string.

If headers is omitted, the keys of the first hash are used. Empty input returns an empty string. Dialect options are passed through to the writer.

Parameters:

  • rows (Array<Hash>)

    data rows

  • headers (Array<Symbol, String>, nil) (defaults to: nil)

    explicit column order (optional)

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Returns:

  • (String)

    CSV string with header row



54
55
56
57
58
59
60
61
62
63
# File 'lib/philiprehberger/csv_kit.rb', line 54

def self.to_csv(rows, headers: nil, dialect: nil)
  return '' if rows.empty? && headers.nil?

  resolved_headers = (headers || rows.first.keys).map(&:to_sym)
  io = StringIO.new
  Writer.stream(io, headers: resolved_headers, dialect: dialect) do |w|
    rows.each { |row| w << (row.is_a?(Hash) ? row.transform_keys(&:to_sym) : row) }
  end
  io.string
end

.to_hashes(path_or_io, dialect: nil) ⇒ Array<Hash{Symbol => String}>

Load an entire CSV into an array of symbolized hashes.

Parameters:

  • path_or_io (String, IO)

    file path or IO object

  • dialect (Symbol, Hash, nil) (defaults to: nil)

    CSV dialect preset or custom options

Returns:

  • (Array<Hash{Symbol => String}>)


37
38
39
40
41
42
43
# File 'lib/philiprehberger/csv_kit.rb', line 37

def self.to_hashes(path_or_io, dialect: nil)
  rows = []
  foreach_row(path_or_io, headers: true, dialect: dialect) do |row|
    rows << row.to_h.transform_keys(&:to_sym)
  end
  rows
end