Ruby Polars
🔥 Blazingly fast DataFrames for Ruby, powered by Polars
Installation
Add this line to your application’s Gemfile:
gem "polars-df"
Getting Started
This library follows the Polars Python API.
Polars.scan_csv("iris.csv")
.filter(Polars.col("sepal_length") > 5)
.group_by("species")
.agg(Polars.all.sum)
.collect
You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
Reference
Creating DataFrames
From a CSV
Polars.read_csv("file.csv")
# or lazily with
Polars.scan_csv("file.csv")
From Parquet
Polars.read_parquet("file.parquet")
# or lazily with
Polars.scan_parquet("file.parquet")
From Active Record
Polars.read_database(User.all)
# or
Polars.read_database("SELECT * FROM users")
From JSON
Polars.read_json("file.json")
# or
Polars.read_ndjson("file.ndjson")
# or lazily with
Polars.scan_ndjson("file.ndjson")
From Feather / Arrow IPC
Polars.read_ipc("file.arrow")
# or lazily with
Polars.scan_ipc("file.arrow")
From Avro
Polars.read_avro("file.avro")
From Iceberg (experimental, requires iceberg)
Polars.scan_iceberg(table)
From Delta Lake (experimental, requires deltalake-rb)
Polars.read_delta("./table")
# or lazily with
Polars.scan_delta("./table")
From a hash
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})
From an array of hashes
Polars::DataFrame.new([
{a: 1, b: "one"},
{a: 2, b: "two"},
{a: 3, b: "three"}
])
From an array of series
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])
Attributes
Get number of rows
df.height
Get column names
df.columns
Check if a column exists
df.include?(name)
Selecting Data
Select a column
df["a"]
Select multiple columns
df[["a", "b"]]
Select first rows
df.head
Select last rows
df.tail
Filtering
Filter on a condition
df.filter(Polars.col("a") == 2)
df.filter(Polars.col("a") != 2)
df.filter(Polars.col("a") > 2)
df.filter(Polars.col("a") >= 2)
df.filter(Polars.col("a") < 2)
df.filter(Polars.col("a") <= 2)
And, or, and exclusive or
df.filter((Polars.col("a") > 1) & (Polars.col("b") == "two")) # and
df.filter((Polars.col("a") > 1) | (Polars.col("b") == "two")) # or
df.filter((Polars.col("a") > 1) ^ (Polars.col("b") == "two")) # xor
Operations
Basic operations
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].abs
Rounding
df["a"].round(2)
df["a"].ceil
df["a"].floor
Logarithm
df["a"].log # natural log
df["a"].log(10)
Exponentiation
df["a"].exp
Trigonometric functions
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].arcsin
df["a"].arccos
df["a"].arctan
Hyperbolic functions
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].arcsinh
df["a"].arccosh
df["a"].arctanh
Summary statistics
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].var
Grouping
Group
df.group_by("a").count
Works with all summary statistics
df.group_by("a").max
Multiple groups
df.group_by(["a", "b"]).count
Combining Data Frames
Add rows
df.vstack(other_df)
Add columns
df.hstack(other_df)
Inner join
df.join(other_df, on: "a")
Left join
df.join(other_df, on: "a", how: "left")
Encoding
One-hot encoding
df.to_dummies
Conversion
Array of hashes
df.to_a
Hash of series
df.to_h
CSV
df.to_csv
# or
df.write_csv("file.csv")
Parquet
df.write_parquet("file.parquet")
JSON
df.write_json("file.json")
# or
df.write_ndjson("file.ndjson")
Feather / Arrow IPC
df.write_ipc("file.arrow")
Avro
df.write_avro("file.avro")
Iceberg (experimental)
df.write_iceberg(table, mode: "append")
Delta Lake (experimental)
df.write_delta("./table")
Numo array
df.to_numo
Types
You can specify column types when creating a data frame
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})
Supported types are:
- boolean -
Boolean - decimal -
Decimal - float -
Float16,Float32,Float64 - integer -
Int8,Int16,Int32,Int64,Int128 - unsigned integer -
UInt8,UInt16,UInt32,UInt64,UInt128 - string -
String,Categorical,Enum - temporal -
Date,Datetime,Duration,Time - nested -
Array,List,Struct - other -
Binary,Object,Null,Unknown
Get column types
df.schema
For a specific column
df["a"].dtype
Cast a column
df["a"].cast(Polars::Int32)
Visualization
Add Vega to your application’s Gemfile:
gem "vega"
And use:
df.plot.line("a", "b")
Supports line, pie, column, bar, area, and scatter plots
Group data
df.plot.line("a", "b", color: "c")
Stacked columns or bars
df.plot.column("a", "b", color: "c", stacked: true)
Plot a series
df["a"].plot.hist
Supports hist, kde, and line plots
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/ruby-polars.git
cd ruby-polars
bundle install
bundle exec rake compile
bundle exec rake test
bundle exec rake test:docs