Module: Ignis::Collective::Algorithms::ReductionOps
- Defined in:
- lib/nvruby/collective/algorithms/reduction_ops.rb
Overview
Reduction operations for collective primitives These operations combine tensor elements during reduce/allreduce
Constant Summary collapse
- OPS =
Valid reduction operations.
%i[sum prod min max avg].freeze
Class Method Summary collapse
-
.avg(a, b, result, count, dtype, stream = nil, _n_participants = nil) ⇒ Object
Average step.
-
.execute(op, a, b, result, count, dtype, stream = nil) ⇒ void
Execute reduction operation by name: result = op(a, b), elementwise.
-
.max(a, b, result, count, dtype, stream = nil) ⇒ Object
Element-wise maximum.
-
.min(a, b, result, count, dtype, stream = nil) ⇒ Object
Element-wise minimum.
-
.prod(a, b, result, count, dtype, stream = nil) ⇒ Object
Multiply all elements (a * b).
-
.sum(a, b, result, count, dtype, stream = nil) ⇒ Object
Sum all elements (a + b).
Class Method Details
.avg(a, b, result, count, dtype, stream = nil, _n_participants = nil) ⇒ Object
Average step. NOTE: averaging is “sum across all ranks, then divide by the participant count ONCE at the end”. The per-pair reduction step is therefore a plain sum; the caller (Communicator) performs the final divide-by-N. (Previously this silently returned a sum with no divide.)
38 39 40 |
# File 'lib/nvruby/collective/algorithms/reduction_ops.rb', line 38 def self.avg(a, b, result, count, dtype, stream = nil, _n_participants = nil) execute(:sum, a, b, result, count, dtype, stream) end |
.execute(op, a, b, result, count, dtype, stream = nil) ⇒ void
This method returns an undefined value.
Execute reduction operation by name: result = op(a, b), elementwise.
51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/nvruby/collective/algorithms/reduction_ops.rb', line 51 def self.execute(op, a, b, result, count, dtype, stream = nil) reduce = (op == :avg ? :sum : op) raise ArgumentError, "Unknown reduction operation: #{op}" unless %i[sum prod min max].include?(reduce) return if count.zero? if dtype == :float32 gpu_elementwise(reduce, a, b, result, count) else # Non-fp32 dtypes use the (correct, slower) host path: the fused JIT # kernels are typed `float`, so reinterpreting fp16/fp64/int buffers # through them would be wrong. host_elementwise_fallback(host_op(reduce), a, b, result, count, dtype) end end |
.max(a, b, result, count, dtype, stream = nil) ⇒ Object
Element-wise maximum
30 31 32 |
# File 'lib/nvruby/collective/algorithms/reduction_ops.rb', line 30 def self.max(a, b, result, count, dtype, stream = nil) execute(:max, a, b, result, count, dtype, stream) end |
.min(a, b, result, count, dtype, stream = nil) ⇒ Object
Element-wise minimum
25 26 27 |
# File 'lib/nvruby/collective/algorithms/reduction_ops.rb', line 25 def self.min(a, b, result, count, dtype, stream = nil) execute(:min, a, b, result, count, dtype, stream) end |
.prod(a, b, result, count, dtype, stream = nil) ⇒ Object
Multiply all elements (a * b)
20 21 22 |
# File 'lib/nvruby/collective/algorithms/reduction_ops.rb', line 20 def self.prod(a, b, result, count, dtype, stream = nil) execute(:prod, a, b, result, count, dtype, stream) end |
.sum(a, b, result, count, dtype, stream = nil) ⇒ Object
Sum all elements (a + b)
15 16 17 |
# File 'lib/nvruby/collective/algorithms/reduction_ops.rb', line 15 def self.sum(a, b, result, count, dtype, stream = nil) execute(:sum, a, b, result, count, dtype, stream) end |