Class: DuckDB::AggregateFunction
- Inherits:
-
Object
- Object
- DuckDB::AggregateFunction
- Includes:
- FunctionTypeValidation
- Defined in:
- lib/duckdb/aggregate_function.rb,
ext/duckdb/aggregate_function.c
Overview
The default set_combine keeps the source state and discards the target, which is only correct for single-threaded (single-partition) execution. If DuckDB runs the aggregate in parallel it will produce wrong results. Always supply an explicit set_combine when the aggregate must be parallel-safe.
DuckDB::AggregateFunction lets you register a custom aggregate function written in Ruby and call it from SQL.
An aggregate function folds many rows into a single value. You define its behaviour with four callbacks:
-
set_init— called once per group; returns the initial state. -
set_update— called once per row; receives the current state and theinput value(s), returns the new state. -
set_combine— merges two partial states (required for parallelexecution); receives source and target states, returns the merged state. -
set_finalize— converts the final state into the SQL result value.
Only set_init is required. The other three have sensible defaults:
-
set_updatedefaults to { |state, *| state } (ignore inputs) -
set_combinedefaults to { |s1, _s2| s1 } (keep source state) -
set_finalizedefaults to { |x| x } (return state as-is)
Basic example: custom SUM
af = DuckDB::AggregateFunction.new
af.name = 'my_sum'
af.return_type = DuckDB::LogicalType::BIGINT
af.add_parameter(DuckDB::LogicalType::BIGINT)
af.set_init { 0 }
af.set_update { |state, value| state + value }
af.set_combine { |s1, s2| s1 + s2 }
con.register_aggregate_function(af)
con.query('SELECT my_sum(i) FROM range(100) t(i)').first.first # => 4950
Example: weighted average with Hash state
af = DuckDB::AggregateFunction.new
af.name = 'weighted_avg'
af.return_type = DuckDB::LogicalType::DOUBLE
af.add_parameter(DuckDB::LogicalType::DOUBLE) # value
af.add_parameter(DuckDB::LogicalType::DOUBLE) # weight
af.set_init { { sum: 0.0, weight: 0.0 } }
af.set_update { |state, value, weight| { sum: state[:sum] + value * weight, weight: state[:weight] + weight } }
af.set_combine { |s1, s2| { sum: s1[:sum] + s2[:sum], weight: s1[:weight] + s2[:weight] } }
af.set_finalize { |state| state[:weight].zero? ? nil : state[:sum] / state[:weight] }
con.register_aggregate_function(af)
Constant Summary
Constants included from FunctionTypeValidation
FunctionTypeValidation::SUPPORTED_TYPES
Class Method Summary collapse
-
._state_registry_size ⇒ Object
Returns the number of Ruby states currently tracked in the registry.
-
.create(name:, return_type:, params: [], init:, update: ->(state, *_inputs) { state }, combine: ->(state, _other_state) { state }, finalize: ->(state) { state }, null_handling: false) ⇒ DuckDB::AggregateFunction
Creates a new AggregateFunction in a single call.
Instance Method Summary collapse
-
#add_parameter(logical_type) ⇒ DuckDB::AggregateFunction
Adds a parameter to the aggregate function.
- #initialize ⇒ Object constructor
- #name=(name) ⇒ Object
-
#return_type=(logical_type) ⇒ DuckDB::AggregateFunction
Sets the return type for the aggregate function.
-
#set_combine ⇒ DuckDB::AggregateFunction
Sets the block that merges two partial states during parallel execution.
-
#set_finalize ⇒ DuckDB::AggregateFunction
Sets the block that converts the final state into the SQL result value.
-
#set_init ⇒ DuckDB::AggregateFunction
Sets the block that initialises the per-group state.
- #set_name(name) ⇒ Object
-
#set_special_handling ⇒ DuckDB::AggregateFunction
Sets special NULL handling for the aggregate function.
-
#set_update ⇒ DuckDB::AggregateFunction
Sets the block that accumulates one row into the state.
Constructor Details
#initialize ⇒ Object
96 97 98 99 100 101 102 103 104 105 106 |
# File 'ext/duckdb/aggregate_function.c', line 96
static VALUE aggregate_function_initialize(VALUE self) {
rubyDuckDBAggregateFunction *p;
TypedData_Get_Struct(self, rubyDuckDBAggregateFunction, &aggregate_function_data_type, p);
p->aggregate_function = duckdb_create_aggregate_function();
p->init_proc = Qnil;
p->update_proc = Qnil;
p->combine_proc = Qnil;
p->finalize_proc = Qnil;
p->special_handling = false;
return self;
}
|
Class Method Details
._state_registry_size ⇒ Object
Returns the number of Ruby states currently tracked in the registry.
727 728 729 730 |
# File 'ext/duckdb/aggregate_function.c', line 727
static VALUE aggregate_function_s__state_registry_size(VALUE klass) {
(void)klass;
return LONG2NUM((long)RHASH_SIZE(g_aggregate_state_registry));
}
|
.create(name:, return_type:, params: [], init:, update: ->(state, *_inputs) { state }, combine: ->(state, _other_state) { state }, finalize: ->(state) { state }, null_handling: false) ⇒ DuckDB::AggregateFunction
Creates a new AggregateFunction in a single call.
This is a convenience factory that builds and configures an AggregateFunction without requiring you to set each attribute separately.
Example: custom SUM
af = DuckDB::AggregateFunction.create(
name: 'my_sum',
return_type: :bigint,
params: [:bigint],
init: -> { 0 },
update: ->(state, value) { state + value },
combine: ->(state, other) { state + other }
)
con.register_aggregate_function(af)
con.query('SELECT my_sum(i) FROM range(100) t(i)').first.first # => 4950
Example: count including NULL values
af = DuckDB::AggregateFunction.create(
name: 'count_with_nulls',
return_type: :bigint,
params: [:bigint],
init: -> { 0 },
update: ->(state, _value) { state + 1 },
combine: ->(state, other) { state + other },
null_handling: true
)
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/duckdb/aggregate_function.rb', line 121 def create( # rubocop:disable Metrics/MethodLength, Metrics/ParameterLists, Metrics/AbcSize name:, return_type:, params: [], # rubocop:disable Style/KeywordParametersOrder init:, update: ->(state, *_inputs) { state }, combine: ->(state, _other_state) { state }, finalize: ->(state) { state }, null_handling: false ) callable!(:init, init) callable!(:update, update) callable!(:combine, combine) callable!(:finalize, finalize) af = AggregateFunction.new af.name = name af.return_type = return_type params.each do |param| af.add_parameter(param) end af.set_init { init.call } af.set_update { |state, *inputs| update.call(state, *inputs) } af.set_combine { |state, other_state| combine.call(state, other_state) } af.set_finalize { |state| finalize.call(state) } af.set_special_handling if null_handling af end |
Instance Method Details
#add_parameter(logical_type) ⇒ DuckDB::AggregateFunction
Adds a parameter to the aggregate function.
173 174 175 176 177 |
# File 'lib/duckdb/aggregate_function.rb', line 173 def add_parameter(logical_type) logical_type = check_supported_type!(logical_type) _add_parameter(logical_type) end |
#name=(name) ⇒ Object
108 109 110 |
# File 'ext/duckdb/aggregate_function.c', line 108 def name=(value) set_name(value.to_s) end |
#return_type=(logical_type) ⇒ DuckDB::AggregateFunction
Sets the return type for the aggregate function.
162 163 164 165 166 |
# File 'lib/duckdb/aggregate_function.rb', line 162 def return_type=(logical_type) logical_type = check_supported_type!(logical_type) _set_return_type(logical_type) end |
#set_combine ⇒ DuckDB::AggregateFunction
The default { |s1, _s2| s1 } is only correct for single-threaded execution. Supply an explicit combine block for parallel-safe aggregates.
Sets the block that merges two partial states during parallel execution. The block receives the source and target states and must return the merged state. May be called after set_init to override the injected default.
221 222 223 224 |
# File 'lib/duckdb/aggregate_function.rb', line 221 def set_combine(&) @combine_set = true _set_combine(&) end |
#set_finalize ⇒ DuckDB::AggregateFunction
Sets the block that converts the final state into the SQL result value. The block receives the accumulated state and must return a value compatible with the declared return_type. Default: { |x| x } (return the state as-is). May be called after set_init to override the injected default.
233 234 235 236 |
# File 'lib/duckdb/aggregate_function.rb', line 233 def set_finalize(&) @finalize_set = true _set_finalize(&) end |
#set_init ⇒ DuckDB::AggregateFunction
The injected default for set_combine is { |s1, _s2| s1 }, which is only correct for single-threaded execution. Always call set_combine explicitly when the aggregate must be parallel-safe.
Sets the block that initialises the per-group state. The block takes no arguments and returns the initial state value. This is the only required callback; defaults for set_update, set_combine, and set_finalize are injected automatically on the first call if those methods have not been called explicitly.
190 191 192 193 194 195 196 197 198 |
# File 'lib/duckdb/aggregate_function.rb', line 190 def set_init(&) unless @init_set _set_update { |state, *| state } unless @update_set _set_combine { |s1, _s2| s1 } unless @combine_set _set_finalize { |x| x } unless @finalize_set end _set_init(&) @init_set = true end |
#set_name(name) ⇒ Object
108 109 110 111 112 113 114 115 116 |
# File 'ext/duckdb/aggregate_function.c', line 108
static VALUE aggregate_function_set_name(VALUE self, VALUE name) {
rubyDuckDBAggregateFunction *p;
TypedData_Get_Struct(self, rubyDuckDBAggregateFunction, &aggregate_function_data_type, p);
const char *str = StringValuePtr(name);
duckdb_aggregate_function_set_name(p->aggregate_function, str);
return self;
}
|
#set_special_handling ⇒ DuckDB::AggregateFunction
Sets special NULL handling for the aggregate function. By default DuckDB skips rows with NULL input values. Calling this method disables that behaviour so the update callback is invoked even when inputs are NULL, receiving nil for each NULL argument. This lets the function implement its own NULL semantics (e.g. counting NULLs).
Wraps duckdb_aggregate_function_set_special_handling.
248 249 250 |
# File 'lib/duckdb/aggregate_function.rb', line 248 def set_special_handling _set_special_handling end |
#set_update ⇒ DuckDB::AggregateFunction
Sets the block that accumulates one row into the state. The block receives the current state followed by the input column value(s) for that row, and must return the updated state. Default: { |state, *| state } (ignore inputs, keep state unchanged). May be called after set_init to override the injected default.
207 208 209 210 |
# File 'lib/duckdb/aggregate_function.rb', line 207 def set_update(&) @update_set = true _set_update(&) end |