Module: Ignis::Epilogues
- Defined in:
- lib/nvruby/epilogues.rb
Overview
Advanced fused epilogues for GPU operations Provides GELU, ReLU, SiLU, Bias addition as fused CUDA kernels
Defined Under Namespace
Modules: Kernels
Constant Summary collapse
- GELU_COEF_A =
GELU approximation constant
0.7978845608028654- GELU_COEF_B =
0.044715
Class Method Summary collapse
-
.bias_add(input, bias, out: nil) ⇒ NvArray
Add bias to tensor.
-
.gelu(input, out: nil) ⇒ NvArray
Apply GELU activation (approximation).
-
.gelu_bias(input, bias, out: nil) ⇒ NvArray
Fused GELU + Bias.
-
.gelu_exact(input, out: nil) ⇒ NvArray
Apply exact GELU activation.
-
.gemm_epilogue(a, b, epilogue:, bias: nil) ⇒ NvArray
Fused GEMM + epilogue.
-
.leaky_relu(input, negative_slope: 0.01, out: nil) ⇒ NvArray
Apply Leaky ReLU activation.
-
.relu(input, out: nil) ⇒ NvArray
Apply ReLU activation.
-
.residual_add(input, residual, out: nil) ⇒ NvArray
Residual addition.
-
.scale(input, factor, out: nil) ⇒ NvArray
Scale tensor by factor.
-
.silu(input, out: nil) ⇒ NvArray
Apply SiLU (Swish) activation.
-
.silu_bias(input, bias, out: nil) ⇒ NvArray
Fused SiLU + Bias.
Class Method Details
.bias_add(input, bias, out: nil) ⇒ NvArray
Add bias to tensor
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
# File 'lib/nvruby/epilogues.rb', line 250 def bias_add(input, bias, out: nil) CUDA::RuntimeAPI.ensure_loaded! shape = input.shape rows = shape.size == 1 ? 1 : shape[0] cols = shape.size == 1 ? shape[0] : shape[1] n = rows * cols device = input.respond_to?(:device_index) ? input.device_index : 0 out ||= Ignis::NvArray.zeros(input.shape, dtype: input.dtype, device: device) kernel = get_kernel(:bias_add, Kernels::BIAS_ADD_KERNEL, "bias_add") block_size = 256 grid_size = (n + block_size - 1) / block_size kernel.launch( grid: [grid_size, 1, 1], block: [block_size, 1, 1], args: [input.device_ffi_ptr, bias.device_ffi_ptr, out.device_ffi_ptr, rows, cols] ) CUDA::RuntimeAPI.cudaDeviceSynchronize out end |
.gelu(input, out: nil) ⇒ NvArray
Apply GELU activation (approximation)
185 186 187 |
# File 'lib/nvruby/epilogues.rb', line 185 def gelu(input, out: nil) apply_unary(input, out, :gelu, Kernels::GELU_KERNEL, "gelu_forward") end |
.gelu_bias(input, bias, out: nil) ⇒ NvArray
Fused GELU + Bias
282 283 284 |
# File 'lib/nvruby/epilogues.rb', line 282 def gelu_bias(input, bias, out: nil) apply_fused_bias(input, bias, out, :gelu_bias, Kernels::GELU_BIAS_KERNEL, "gelu_bias_forward") end |
.gelu_exact(input, out: nil) ⇒ NvArray
Apply exact GELU activation
194 195 196 |
# File 'lib/nvruby/epilogues.rb', line 194 def gelu_exact(input, out: nil) apply_unary(input, out, :gelu_exact, Kernels::GELU_EXACT_KERNEL, "gelu_exact_forward") end |
.gemm_epilogue(a, b, epilogue:, bias: nil) ⇒ NvArray
Fused GEMM + epilogue
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
# File 'lib/nvruby/epilogues.rb', line 359 def gemm_epilogue(a, b, epilogue:, bias: nil) # Perform GEMM c = Ignis::LinAlg.matmul(a, b) # Apply epilogue result = case epilogue when :gelu bias ? gelu_bias(c, bias) : gelu(c) when :relu temp = bias ? bias_add(c, bias) : c relu(temp) when :silu bias ? silu_bias(c, bias) : silu(c) else bias ? bias_add(c, bias) : c end result end |
.leaky_relu(input, negative_slope: 0.01, out: nil) ⇒ NvArray
Apply Leaky ReLU activation
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/nvruby/epilogues.rb', line 222 def leaky_relu(input, negative_slope: 0.01, out: nil) CUDA::RuntimeAPI.ensure_loaded! n = input.size device = input.respond_to?(:device_index) ? input.device_index : 0 out ||= Ignis::NvArray.zeros(input.shape, dtype: input.dtype, device: device) kernel = get_kernel(:leaky_relu, Kernels::LEAKY_RELU_KERNEL, "leaky_relu_forward") block_size = 256 grid_size = (n + block_size - 1) / block_size kernel.launch( grid: [grid_size, 1, 1], block: [block_size, 1, 1], args: [input.device_ffi_ptr, out.device_ffi_ptr, n, negative_slope] ) CUDA::RuntimeAPI.cudaDeviceSynchronize out end |
.relu(input, out: nil) ⇒ NvArray
Apply ReLU activation
212 213 214 |
# File 'lib/nvruby/epilogues.rb', line 212 def relu(input, out: nil) apply_unary(input, out, :relu, Kernels::RELU_KERNEL, "relu_forward") end |
.residual_add(input, residual, out: nil) ⇒ NvArray
Residual addition
302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
# File 'lib/nvruby/epilogues.rb', line 302 def residual_add(input, residual, out: nil) CUDA::RuntimeAPI.ensure_loaded! n = input.size device = input.respond_to?(:device_index) ? input.device_index : 0 out ||= Ignis::NvArray.zeros(input.shape, dtype: input.dtype, device: device) kernel = get_kernel(:residual_add, Kernels::RESIDUAL_ADD_KERNEL, "residual_add") block_size = 256 grid_size = (n + block_size - 1) / block_size kernel.launch( grid: [grid_size, 1, 1], block: [block_size, 1, 1], args: [input.device_ffi_ptr, residual.device_ffi_ptr, out.device_ffi_ptr, n] ) CUDA::RuntimeAPI.cudaDeviceSynchronize out end |
.scale(input, factor, out: nil) ⇒ NvArray
Scale tensor by factor
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 |
# File 'lib/nvruby/epilogues.rb', line 330 def scale(input, factor, out: nil) CUDA::RuntimeAPI.ensure_loaded! n = input.size device = input.respond_to?(:device_index) ? input.device_index : 0 out ||= Ignis::NvArray.zeros(input.shape, dtype: input.dtype, device: device) kernel = get_kernel(:scale, Kernels::SCALE_KERNEL, "scale") block_size = 256 grid_size = (n + block_size - 1) / block_size kernel.launch( grid: [grid_size, 1, 1], block: [block_size, 1, 1], args: [input.device_ffi_ptr, out.device_ffi_ptr, factor, n] ) CUDA::RuntimeAPI.cudaDeviceSynchronize out end |
.silu(input, out: nil) ⇒ NvArray
Apply SiLU (Swish) activation
203 204 205 |
# File 'lib/nvruby/epilogues.rb', line 203 def silu(input, out: nil) apply_unary(input, out, :silu, Kernels::SILU_KERNEL, "silu_forward") end |
.silu_bias(input, bias, out: nil) ⇒ NvArray
Fused SiLU + Bias
292 293 294 |
# File 'lib/nvruby/epilogues.rb', line 292 def silu_bias(input, bias, out: nil) apply_fused_bias(input, bias, out, :silu_bias, Kernels::SILU_BIAS_KERNEL, "silu_bias_forward") end |