Class: Ignis::MathDx::FftKernel
- Inherits:
-
Object
- Object
- Ignis::MathDx::FftKernel
- Defined in:
- lib/nvruby/mathdx/fft_kernel.rb
Overview
Device-side FFT kernel using cuFFTDx patterns Generates and compiles CUDA C++ code for thread block FFT operations
cuFFTDx enables embedding FFT operations inside CUDA kernels, allowing fusion with other operations to reduce memory bandwidth.
Constant Summary collapse
- SUPPORTED_SIZES =
Supported FFT sizes (powers of 2)
[16, 32, 64, 128, 256, 512, 1024].freeze
Instance Attribute Summary collapse
-
#compiled ⇒ Boolean
readonly
Whether kernel is compiled.
-
#direction ⇒ Symbol
readonly
FFT direction.
-
#dtype ⇒ Symbol
readonly
Data type.
-
#elements_per_thread ⇒ Integer
readonly
Elements per thread.
-
#size ⇒ Integer
readonly
FFT size.
Instance Method Summary collapse
-
#compile!(device_id: 0) ⇒ self
Compile the FFT kernel.
-
#destroy! ⇒ void
Release kernel resources.
-
#execute(input, output: nil, batch: 1, stream: nil) ⇒ NvArray
Execute the FFT kernel.
-
#initialize(size:, dtype: :complex64, direction: :forward, elements_per_thread: 8) ⇒ FftKernel
constructor
Initialize FFT kernel configuration.
Constructor Details
#initialize(size:, dtype: :complex64, direction: :forward, elements_per_thread: 8) ⇒ FftKernel
Initialize FFT kernel configuration
39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 39 def initialize(size:, dtype: :complex64, direction: :forward, elements_per_thread: 8) validate_size!(size) validate_dtype!(dtype) validate_direction!(direction) @size = size @dtype = dtype @direction = direction @elements_per_thread = elements_per_thread @compiled = false @kernel = nil end |
Instance Attribute Details
#compiled ⇒ Boolean (readonly)
Returns Whether kernel is compiled.
29 30 31 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 29 def compiled @compiled end |
#direction ⇒ Symbol (readonly)
Returns FFT direction.
23 24 25 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 23 def direction @direction end |
#dtype ⇒ Symbol (readonly)
Returns Data type.
20 21 22 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 20 def dtype @dtype end |
#elements_per_thread ⇒ Integer (readonly)
Returns Elements per thread.
26 27 28 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 26 def elements_per_thread @elements_per_thread end |
#size ⇒ Integer (readonly)
Returns FFT size.
17 18 19 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 17 def size @size end |
Instance Method Details
#compile!(device_id: 0) ⇒ self
Compile the FFT kernel
55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 55 def compile!(device_id: 0) source = generate_source @kernel = Ignis::JIT::Compiler.compile( source, "cufftdx_fft", device_id: device_id, options: ) @compiled = true self end |
#destroy! ⇒ void
This method returns an undefined value.
Release kernel resources
111 112 113 114 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 111 def destroy! @kernel = nil @compiled = false end |
#execute(input, output: nil, batch: 1, stream: nil) ⇒ NvArray
Execute the FFT kernel
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/nvruby/mathdx/fft_kernel.rb', line 73 def execute(input, output: nil, batch: 1, stream: nil) raise StateError, "Kernel not compiled. Call compile! first." unless @compiled validate_execution_input!(input) # Ensure input is on device input_dev = input.on_device? ? input : input.to_device # Create output if needed output_dev = if output output.on_device? ? output : output.to_device else NvArray.zeros(input.shape, dtype: @dtype, device: input_dev.device_index).to_device end # Calculate grid dimensions threads_per_fft = @size / @elements_per_thread blocks = batch # Launch kernel @kernel.launch( grid: [blocks], block: [threads_per_fft], shared_memory: shared_memory_size, args: [ input_dev.device_ptr, output_dev.device_ptr, @size, batch ], stream: stream ) output_dev end |