Class: Ignis::AI::Device
- Inherits:
-
Object
- Object
- Ignis::AI::Device
- Defined in:
- lib/nnw/ai/device.rb
Overview
Device ā dynamic GPU capability detection and configuration.
Queries GPU properties at runtime: VRAM, compute capability, SM count, etc. All model configurations adapt based on the actual hardware present. No hardcoded GPU assumptions.
Defined Under Namespace
Classes: DeviceProperties
Class Method Summary collapse
-
.all_devices ⇒ Array<DeviceProperties>
Query all GPU devices and cache properties.
-
.count ⇒ Integer
Number of available GPUs.
-
.free_memory(device_id = 0) ⇒ Integer
Estimate free VRAM (queries cudaMemGetInfo).
-
.multi_gpu_config ⇒ Hash
Check if multi-GPU is available and worth using.
-
.properties(device_id = 0) ⇒ DeviceProperties
Get properties for a specific device.
-
.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ Hash
Recommend optimal batch size and sequence length for a model.
-
.reset! ⇒ void
Clear cached device info (call after hardware changes).
-
.summary ⇒ String
Summary string for logging.
-
.total_memory(device_id = 0) ⇒ Integer
Total VRAM on device in bytes.
Class Method Details
.all_devices ⇒ Array<DeviceProperties>
Query all GPU devices and cache properties.
24 25 26 |
# File 'lib/nnw/ai/device.rb', line 24 def all_devices @all_devices ||= enumerate_devices end |
.count ⇒ Integer
Number of available GPUs.
51 52 53 |
# File 'lib/nnw/ai/device.rb', line 51 def count all_devices.length end |
.free_memory(device_id = 0) ⇒ Integer
Estimate free VRAM (queries cudaMemGetInfo).
45 46 47 |
# File 'lib/nnw/ai/device.rb', line 45 def free_memory(device_id = 0) query_free_memory(device_id) end |
.multi_gpu_config ⇒ Hash
Check if multi-GPU is available and worth using.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# File 'lib/nnw/ai/device.rb', line 134 def multi_gpu_config devs = all_devices if devs.length <= 1 return { multi_gpu: false, device_ids: [0], strategy: :single } end # Check if devices are compatible (same compute capability) ccs = devs.map(&:compute_capability).uniq if ccs.length == 1 { multi_gpu: true, device_ids: devs.map(&:id), strategy: :data_parallel } else # Heterogeneous GPUs ā only use matching ones dominant_cc = devs.group_by(&:compute_capability).max_by { |_, v| v.length }[0] matching = devs.select { |d| d.compute_capability == dominant_cc } { multi_gpu: matching.length > 1, device_ids: matching.map(&:id), strategy: :data_parallel, warning: "Heterogeneous GPUs detected. Using #{matching.length} " \ "devices with CC #{dominant_cc}." } end end |
.properties(device_id = 0) ⇒ DeviceProperties
Get properties for a specific device.
31 32 33 |
# File 'lib/nnw/ai/device.rb', line 31 def properties(device_id = 0) all_devices[device_id] || raise("No GPU device #{device_id} found") end |
.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ Hash
Recommend optimal batch size and sequence length for a model.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/nnw/ai/device.rb', line 72 def recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) dev = properties(device_id) available_bytes = (dev.total_memory_bytes * target_utilization).to_i # Weight memory weight_bytes = model_params * dtype_bytes # Optimizer state (Adam: 2x params for m, v) optimizer_bytes = model_params * 4 * 2 # Gradient storage gradient_bytes = model_params * dtype_bytes # Fixed overhead fixed_bytes = weight_bytes + optimizer_bytes + gradient_bytes # Remaining for activations activation_budget = available_bytes - fixed_bytes if activation_budget <= 0 return { batch_size: 1, seq_len: 128, use_flash_attention: true, use_gradient_checkpointing: true, warning: "Model too large for this GPU. Consider model parallelism or FP16." } end # Estimate activation memory per token per layer # Rough estimate: 4 * hidden_dim * dtype_bytes per token per layer # Hidden dim ~ sqrt(model_params / 12) for typical transformers estimated_hidden = Math.sqrt(model_params / 12.0).to_i estimated_layers = [model_params / (estimated_hidden * estimated_hidden * 12), 1].max activation_per_token = 4 * estimated_hidden * dtype_bytes * estimated_layers # Target: batch_size * seq_len * activation_per_token <= activation_budget total_tokens = activation_budget / [activation_per_token, 1].max # Prefer seq_len of 1024, adjust batch_size seq_len = [1024, total_tokens].min batch_size = [total_tokens / seq_len, 1].max # Flash attention saves O(N²) memory ā worth it for seq_len > 512 use_flash = seq_len > 512 # Gradient checkpointing if we're tight on memory use_checkpointing = activation_budget < weight_bytes * 2 { batch_size: batch_size.to_i, seq_len: seq_len.to_i, use_flash_attention: use_flash, use_gradient_checkpointing: use_checkpointing, estimated_vram_usage_gb: (fixed_bytes / (1024.0**3)).round(2), available_vram_gb: (available_bytes / (1024.0**3)).round(2), activation_budget_gb: (activation_budget / (1024.0**3)).round(2) } end |
.reset! ⇒ void
This method returns an undefined value.
Clear cached device info (call after hardware changes).
160 161 162 |
# File 'lib/nnw/ai/device.rb', line 160 def reset! @all_devices = nil end |
.summary ⇒ String
Summary string for logging.
57 58 59 60 61 62 63 64 |
# File 'lib/nnw/ai/device.rb', line 57 def summary lines = ["GPU Devices (#{count}):"] all_devices.each do |dev| lines << " [#{dev.id}] #{dev.name} | #{dev.total_memory_gb}GB VRAM | " \ "CC #{dev.compute_capability} | #{dev.sm_count} SMs" end lines.join("\n") end |
.total_memory(device_id = 0) ⇒ Integer
Total VRAM on device in bytes.
38 39 40 |
# File 'lib/nnw/ai/device.rb', line 38 def total_memory(device_id = 0) properties(device_id).total_memory_bytes end |