philiprehberger-csv_kit
Streaming CSV processor with type coercion and validation
Requirements
- Ruby >= 3.1
Installation
Add to your Gemfile:
gem "philiprehberger-csv_kit"
Or install directly:
gem install philiprehberger-csv_kit
Usage
require "philiprehberger/csv_kit"
rows = Philiprehberger::CsvKit.to_hashes("data.csv")
# => [{name: "Alice", age: "30"}, ...]
Pluck Columns
names = Philiprehberger::CsvKit.pluck("data.csv", :name, :city)
# => [{name: "Alice", city: "Berlin"}, ...]
Inspect Headers
Philiprehberger::CsvKit.headers("data.csv")
# => [:name, :age, :city]
Count Rows
Philiprehberger::CsvKit.count("data.csv")
# => 1000
Streaming Row-by-Row
Iterate rows with constant memory. Returns an Enumerator if no block is given:
Philiprehberger::CsvKit.each_hash("large.csv") do |row|
puts row[:name]
end
# Or compose with Enumerator methods:
adults = Philiprehberger::CsvKit.each_hash("data.csv")
.select { |r| r[:age].to_i >= 18 }
.first(10)
Find First Match
Return the first row that matches a predicate, streaming and stopping on the first hit:
user = Philiprehberger::CsvKit.find("users.csv") { |row| row[:email] == "a@b.com" }
# => {email: "a@b.com", name: "Alice"} or nil
Filter Rows
csv_string = Philiprehberger::CsvKit.filter("data.csv") do |row|
row[:age].to_i >= 30
end
Streaming Processor
rows = Philiprehberger::CsvKit.process("data.csv") do |p|
p.transform(:age) { |v| v.to_i }
p.validate(:age) { |v| v.to_i.positive? }
p.reject { |row| row[:city] == "Unknown" }
p.each { |row| puts row[:name] }
end
Date/Time Type Coercions
rows = Philiprehberger::CsvKit.process("data.csv") do |p|
p.type(:birthday, :date)
p.type(:created_at, :datetime, format: "%Y-%m-%dT%H:%M:%S")
end
CSV Dialects
rows = Philiprehberger::CsvKit.to_hashes("data.csv", dialect: :excel)
rows = Philiprehberger::CsvKit.process("data.csv", dialect: { delimiter: ";", quote: "'" }) do |p|
p.transform(:age, &:to_i)
end
Writing CSV
writer = Philiprehberger::CsvKit::Writer.new(headers: [:name, :age])
csv_string = writer.write([{ name: "Alice", age: 30 }, { name: "Bob", age: 25 }])
File.open("output.csv", "w") do |f|
writer.write_to([{ name: "Alice", age: 30 }], f)
end
Streaming Writer
File.open("output.csv", "w") do |f|
Philiprehberger::CsvKit::Writer.stream(f, headers: [:name, :age]) do |w|
w << { name: "Alice", age: 30 }
w << { name: "Bob", age: 25 }
end
end
Error Recovery
rows = Philiprehberger::CsvKit.process("data.csv") do |p|
p.on_error { |row, err| :skip }
p.transform(:age) { |v| Integer(v) }
end
Skip and Limit
rows = Philiprehberger::CsvKit.process("data.csv") do |p|
p.skip(10) # skip first 10 rows
p.limit(50) # stop after 50 rows
end
Column Aliasing
rows = Philiprehberger::CsvKit.process("data.csv") do |p|
p.rename(:raw_col, :clean_col)
end
Delimiter Detection
delimiter = Philiprehberger::CsvKit::Detector.detect("data.tsv")
# => "\t"
API
| Method / Class | Description |
|---|---|
CsvKit.to_hashes(path, dialect:) |
Load CSV into array of symbolized hashes |
CsvKit.pluck(path, *keys, dialect:) |
Extract specific columns |
CsvKit.filter(path, dialect:, &block) |
Filter rows, return CSV string |
CsvKit.find(path, dialect:, &block) |
Return the first row matching the predicate, or nil |
CsvKit.headers(path, dialect:) |
Return header row as array of symbols |
CsvKit.count(path, dialect:) |
Count data rows without loading into memory |
CsvKit.each_hash(path, dialect:, &block) |
Stream rows as symbolized hashes; returns Enumerator if no block |
CsvKit.process(path_or_io, dialect:, &block) |
Streaming DSL with transforms and validations |
Processor#headers(*names) |
Override header names |
Processor#transform(key, &block) |
Register column transform |
Processor#type(key, type, **opts) |
Register built-in type coercion (:integer, :float, :string, :date, :datetime) |
Processor#validate(key, &block) |
Register column validation (skip invalid) |
Processor#skip(n) |
Skip the first N data rows |
Processor#limit(n) |
Stop after processing N rows |
Processor#reject(&block) |
Reject rows matching predicate |
Processor#each(&block) |
Callback for each processed row |
Processor#on_error(&block) |
Per-row error handler (return :skip or :abort) |
Processor#max_errors(n) |
Stop after N errors |
Processor#errors |
Collected errors from last run |
Processor#rename(from, to) |
Rename column during processing |
Processor#after_each(&block) |
Callback after each row is fully processed |
Writer.new(headers:) |
Create a CSV writer with given headers |
Writer#write(rows) |
Generate CSV string from rows |
Writer#write_to(rows, io) |
Write CSV to an IO object |
Writer.stream(io, headers:, dialect:) |
Stream CSV rows incrementally to an IO |
Dialect.new(name_or_hash) |
Create a dialect from preset or custom hash |
Detector.detect(path_or_io) |
Auto-detect CSV delimiter |
Row#[](key) |
Access value by symbol key |
Row#keys |
Column names as array of symbols |
Row#values |
Column values as array |
Row#size |
Number of columns |
| `Row#each { \ | k, v\ |
Row#merge(other) |
Return new Row with merged data |
Row#to_h |
Convert row to plain hash |
Development
bundle install
bundle exec rspec
bundle exec rubocop
Support
If you find this project useful: