philiprehberger-csv_kit

Tests Gem Version Last updated

Streaming CSV processor with type coercion and validation

Requirements

  • Ruby >= 3.1

Installation

Add to your Gemfile:

gem "philiprehberger-csv_kit"

Or install directly:

gem install philiprehberger-csv_kit

Usage

require "philiprehberger/csv_kit"

rows = Philiprehberger::CsvKit.to_hashes("data.csv")
# => [{name: "Alice", age: "30"}, ...]

Pluck Columns

names = Philiprehberger::CsvKit.pluck("data.csv", :name, :city)
# => [{name: "Alice", city: "Berlin"}, ...]

Inspect Headers

Philiprehberger::CsvKit.headers("data.csv")
# => [:name, :age, :city]

Count Rows

Philiprehberger::CsvKit.count("data.csv")
# => 1000

Streaming Row-by-Row

Iterate rows with constant memory. Returns an Enumerator if no block is given:

Philiprehberger::CsvKit.each_hash("large.csv") do |row|
  puts row[:name]
end

# Or compose with Enumerator methods:
adults = Philiprehberger::CsvKit.each_hash("data.csv")
  .select { |r| r[:age].to_i >= 18 }
  .first(10)

Find First Match

Return the first row that matches a predicate, streaming and stopping on the first hit:

user = Philiprehberger::CsvKit.find("users.csv") { |row| row[:email] == "a@b.com" }
# => {email: "a@b.com", name: "Alice"} or nil

Filter Rows

csv_string = Philiprehberger::CsvKit.filter("data.csv") do |row|
  row[:age].to_i >= 30
end

Streaming Processor

rows = Philiprehberger::CsvKit.process("data.csv") do |p|
  p.transform(:age) { |v| v.to_i }
  p.validate(:age) { |v| v.to_i.positive? }
  p.reject { |row| row[:city] == "Unknown" }
  p.each { |row| puts row[:name] }
end

Date/Time Type Coercions

rows = Philiprehberger::CsvKit.process("data.csv") do |p|
  p.type(:birthday, :date)
  p.type(:created_at, :datetime, format: "%Y-%m-%dT%H:%M:%S")
end

CSV Dialects

rows = Philiprehberger::CsvKit.to_hashes("data.csv", dialect: :excel)
rows = Philiprehberger::CsvKit.process("data.csv", dialect: { delimiter: ";", quote: "'" }) do |p|
  p.transform(:age, &:to_i)
end

Writing CSV

writer = Philiprehberger::CsvKit::Writer.new(headers: [:name, :age])
csv_string = writer.write([{ name: "Alice", age: 30 }, { name: "Bob", age: 25 }])

File.open("output.csv", "w") do |f|
  writer.write_to([{ name: "Alice", age: 30 }], f)
end

Streaming Writer

File.open("output.csv", "w") do |f|
  Philiprehberger::CsvKit::Writer.stream(f, headers: [:name, :age]) do |w|
    w << { name: "Alice", age: 30 }
    w << { name: "Bob", age: 25 }
  end
end

Error Recovery

rows = Philiprehberger::CsvKit.process("data.csv") do |p|
  p.on_error { |row, err| :skip }
  p.transform(:age) { |v| Integer(v) }
end

Skip and Limit

rows = Philiprehberger::CsvKit.process("data.csv") do |p|
  p.skip(10)   # skip first 10 rows
  p.limit(50)  # stop after 50 rows
end

Column Aliasing

rows = Philiprehberger::CsvKit.process("data.csv") do |p|
  p.rename(:raw_col, :clean_col)
end

Delimiter Detection

delimiter = Philiprehberger::CsvKit::Detector.detect("data.tsv")
# => "\t"

API

Method / Class Description
CsvKit.to_hashes(path, dialect:) Load CSV into array of symbolized hashes
CsvKit.pluck(path, *keys, dialect:) Extract specific columns
CsvKit.filter(path, dialect:, &block) Filter rows, return CSV string
CsvKit.find(path, dialect:, &block) Return the first row matching the predicate, or nil
CsvKit.headers(path, dialect:) Return header row as array of symbols
CsvKit.count(path, dialect:) Count data rows without loading into memory
CsvKit.each_hash(path, dialect:, &block) Stream rows as symbolized hashes; returns Enumerator if no block
CsvKit.process(path_or_io, dialect:, &block) Streaming DSL with transforms and validations
Processor#headers(*names) Override header names
Processor#transform(key, &block) Register column transform
Processor#type(key, type, **opts) Register built-in type coercion (:integer, :float, :string, :date, :datetime)
Processor#validate(key, &block) Register column validation (skip invalid)
Processor#skip(n) Skip the first N data rows
Processor#limit(n) Stop after processing N rows
Processor#reject(&block) Reject rows matching predicate
Processor#each(&block) Callback for each processed row
Processor#on_error(&block) Per-row error handler (return :skip or :abort)
Processor#max_errors(n) Stop after N errors
Processor#errors Collected errors from last run
Processor#rename(from, to) Rename column during processing
Processor#after_each(&block) Callback after each row is fully processed
Writer.new(headers:) Create a CSV writer with given headers
Writer#write(rows) Generate CSV string from rows
Writer#write_to(rows, io) Write CSV to an IO object
Writer.stream(io, headers:, dialect:) Stream CSV rows incrementally to an IO
Dialect.new(name_or_hash) Create a dialect from preset or custom hash
Detector.detect(path_or_io) Auto-detect CSV delimiter
Row#[](key) Access value by symbol key
Row#keys Column names as array of symbols
Row#values Column values as array
Row#size Number of columns
`Row#each { \ k, v\
Row#merge(other) Return new Row with merged data
Row#to_h Convert row to plain hash

Development

bundle install
bundle exec rspec
bundle exec rubocop

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT