Class: Polars::RollingGroupBy

Inherits:
Object
  • Object
show all
Defined in:
lib/polars/rolling_group_by.rb

Overview

A rolling grouper.

This has an .agg method which will allow you to run all polars expressions in a group by context.

Instance Method Summary collapse

Instance Method Details

#agg(*aggs, **named_aggs) ⇒ DataFrame

Compute aggregations for each group of a group by operation.

Parameters:

  • aggs (Array)

    Aggregations to compute for each group of the group by operation, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

  • named_aggs (Hash)

    Additional aggregations, specified as keyword arguments. The resulting columns will be renamed to the keyword used.

Returns:



65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/polars/rolling_group_by.rb', line 65

def agg(*aggs, **named_aggs)
  group_by =
    @df.lazy.rolling(
      index_column: @time_column, period: @period, offset: @offset, closed: @closed, group_by: @group_by
    )

  if @predicates&.any?
    group_by = group_by.having(@predicates)
  end

  group_by.agg(*aggs, **named_aggs).collect(
    optimizations: QueryOptFlags.none
  )
end

#having(*predicates) ⇒ RollingGroupBy

Filter groups with a list of predicates after aggregation.

Using this method is equivalent to adding the predicates to the aggregation and filtering afterwards.

This method can be chained and all conditions will be combined using &.

Parameters:

  • predicates (Array)

    Expressions that evaluate to a boolean value for each group. Typically, this requires the use of an aggregation function. Multiple predicates are combined using &.

Returns:



42
43
44
45
46
47
48
49
50
51
52
# File 'lib/polars/rolling_group_by.rb', line 42

def having(*predicates)
  RollingGroupBy.new(
    @df,
    @time_column,
    @period,
    @offset,
    @closed,
    @group_by,
    Utils._chain_predicates(@predicates, predicates)
  )
end

#map_groups(schema, &function) ⇒ DataFrame

Apply a custom/user-defined function (UDF) over the groups as a new DataFrame.

Using this is considered an anti-pattern as it will be very slow because:

  • it forces the engine to materialize the whole DataFrames for the groups.
  • it is not parallelized.
  • it blocks optimizations as the passed python function is opaque to the optimizer.

The idiomatic way to apply custom functions over multiple columns is using:

Polars.struct([my_columns]).map_elements { |struct_series| ... }

Parameters:

  • schema (Object)

    Schema of the output function. This has to be known statically. If the given schema is incorrect, this is a bug in the caller's query and may lead to errors. If set to None, polars assumes the schema is unchanged.

Returns:



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/polars/rolling_group_by.rb', line 99

def map_groups(
  schema,
  &function
)
  if @predicates&.any?
    msg = "cannot call `map_groups` when filtering groups with `having`"
    raise TypeError, msg
  end

  @df.lazy
    .rolling(
      index_column: @time_column,
      period: @period,
      offset: @offset,
      closed: @closed,
      group_by: @group_by
    )
    .map_groups(schema, &function)
    .collect(optimizations: QueryOptFlags.none)
end