Class: Polars::LazyGroupBy
- Inherits:
-
Object
- Object
- Polars::LazyGroupBy
- Defined in:
- lib/polars/lazy_group_by.rb
Overview
Created by df.lazy.group_by("foo").
Instance Method Summary collapse
-
#agg(*aggs, **named_aggs) ⇒ LazyFrame
Compute aggregations for each group of a group by operation.
-
#all ⇒ LazyFrame
Aggregate the groups into Series.
-
#first(ignore_nulls: false) ⇒ LazyFrame
Aggregate the first values in the group.
-
#having(*predicates) ⇒ LazyGroupBy
Filter groups with a list of predicates after aggregation.
-
#head(n = 5) ⇒ LazyFrame
Get the first
nrows of each group. -
#last(ignore_nulls: false) ⇒ LazyFrame
Aggregate the last values in the group.
-
#len(name: nil) ⇒ LazyFrame
Return the number of rows in each group.
-
#map_groups(schema, &function) ⇒ LazyFrame
Apply a custom/user-defined function (UDF) over the groups as a new DataFrame.
-
#max ⇒ LazyFrame
Reduce the groups to the maximal value.
-
#mean ⇒ LazyFrame
Reduce the groups to the mean values.
-
#median ⇒ LazyFrame
Return the median per group.
-
#min ⇒ LazyFrame
Reduce the groups to the minimal value.
-
#n_unique ⇒ LazyFrame
Count the unique values per group.
-
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Compute the quantile per group.
-
#sum ⇒ LazyFrame
Reduce the groups to the sum.
-
#tail(n = 5) ⇒ LazyFrame
Get the last
nrows of each group.
Instance Method Details
#agg(*aggs, **named_aggs) ⇒ LazyFrame
Compute aggregations for each group of a group by operation.
148 149 150 151 |
# File 'lib/polars/lazy_group_by.rb', line 148 def agg(*aggs, **named_aggs) rbexprs = Utils.parse_into_list_of_expressions(*aggs, **named_aggs) Utils.wrap_ldf(@lgb.agg(rbexprs)) end |
#all ⇒ LazyFrame
Aggregate the groups into Series.
298 299 300 |
# File 'lib/polars/lazy_group_by.rb', line 298 def all agg(F.all) end |
#first(ignore_nulls: false) ⇒ LazyFrame
Aggregate the first values in the group.
373 374 375 |
# File 'lib/polars/lazy_group_by.rb', line 373 def first(ignore_nulls: false) agg(F.all.first(ignore_nulls: ignore_nulls)) end |
#having(*predicates) ⇒ LazyGroupBy
Filter groups with a list of predicates after aggregation.
Using this method is equivalent to adding the predicates to the aggregation and filtering afterwards.
This method can be chained and all conditions will be combined using &.
42 43 44 45 46 |
# File 'lib/polars/lazy_group_by.rb', line 42 def having(*predicates) rbexprs = Utils.parse_into_list_of_expressions(*predicates) @lgb = @lgb.having(rbexprs) self end |
#head(n = 5) ⇒ LazyFrame
Get the first n rows of each group.
240 241 242 |
# File 'lib/polars/lazy_group_by.rb', line 240 def head(n = 5) Utils.wrap_ldf(@lgb.head(n)) end |
#last(ignore_nulls: false) ⇒ LazyFrame
Aggregate the last values in the group.
408 409 410 |
# File 'lib/polars/lazy_group_by.rb', line 408 def last(ignore_nulls: false) agg(F.all.last(ignore_nulls: ignore_nulls)) end |
#len(name: nil) ⇒ LazyFrame
Return the number of rows in each group.
335 336 337 338 339 340 341 |
# File 'lib/polars/lazy_group_by.rb', line 335 def len(name: nil) len_expr = F.len if !name.nil? len_expr = len_expr.alias(name) end agg(len_expr) end |
#map_groups(schema, &function) ⇒ LazyFrame
This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.
Apply a custom/user-defined function (UDF) over the groups as a new DataFrame.
Using this is considered an anti-pattern as it will be very slow because:
- it forces the engine to materialize the whole
DataFramesfor the groups. - it is not parallelized
- it blocks optimizations as the passed python function is opaque to the optimizer
The idiomatic way to apply custom functions over multiple columns is using:
Polars.struct([my_columns]).apply { |struct_series| ... }
203 204 205 206 207 208 209 210 |
# File 'lib/polars/lazy_group_by.rb', line 203 def map_groups( schema, &function ) Utils.wrap_ldf( @lgb.map_groups(->(df) { function.(Utils.wrap_df(df))._df }, schema) ) end |
#max ⇒ LazyFrame
Reduce the groups to the maximal value.
437 438 439 |
# File 'lib/polars/lazy_group_by.rb', line 437 def max agg(F.all.max) end |
#mean ⇒ LazyFrame
Reduce the groups to the mean values.
466 467 468 |
# File 'lib/polars/lazy_group_by.rb', line 466 def mean agg(F.all.mean) end |
#median ⇒ LazyFrame
Return the median per group.
493 494 495 |
# File 'lib/polars/lazy_group_by.rb', line 493 def median agg(F.all.median) end |
#min ⇒ LazyFrame
Reduce the groups to the minimal value.
522 523 524 |
# File 'lib/polars/lazy_group_by.rb', line 522 def min agg(F.all.min) end |
#n_unique ⇒ LazyFrame
Count the unique values per group.
549 550 551 |
# File 'lib/polars/lazy_group_by.rb', line 549 def n_unique agg(F.all.n_unique) end |
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Compute the quantile per group.
582 583 584 |
# File 'lib/polars/lazy_group_by.rb', line 582 def quantile(quantile, interpolation: "nearest") agg(F.all.quantile(quantile, interpolation: interpolation)) end |
#sum ⇒ LazyFrame
Reduce the groups to the sum.
611 612 613 |
# File 'lib/polars/lazy_group_by.rb', line 611 def sum agg(F.all.sum) end |
#tail(n = 5) ⇒ LazyFrame
Get the last n rows of each group.
272 273 274 |
# File 'lib/polars/lazy_group_by.rb', line 272 def tail(n = 5) Utils.wrap_ldf(@lgb.tail(n)) end |