Class: Ignis::AI::Device

Inherits:
Object
  • Object
show all
Defined in:
lib/nnw/ai/device.rb

Overview

Device — dynamic GPU capability detection and configuration.

Queries GPU properties at runtime: VRAM, compute capability, SM count, etc. All model configurations adapt based on the actual hardware present. No hardcoded GPU assumptions.

Defined Under Namespace

Classes: DeviceProperties

Class Method Summary collapse

Class Method Details

.all_devicesArray<DeviceProperties>

Query all GPU devices and cache properties.

Returns:



24
25
26
# File 'lib/nnw/ai/device.rb', line 24

def all_devices
  @all_devices ||= enumerate_devices
end

.countInteger

Number of available GPUs.

Returns:

  • (Integer)


51
52
53
# File 'lib/nnw/ai/device.rb', line 51

def count
  all_devices.length
end

.free_memory(device_id = 0) ⇒ Integer

Estimate free VRAM (queries cudaMemGetInfo).

Parameters:

  • device_id (Integer) (defaults to: 0)

Returns:

  • (Integer)

    free bytes



45
46
47
# File 'lib/nnw/ai/device.rb', line 45

def free_memory(device_id = 0)
  query_free_memory(device_id)
end

.multi_gpu_configHash

Check if multi-GPU is available and worth using.

Returns:

  • (Hash)

    :multi_gpu, :device_ids, :strategy



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# File 'lib/nnw/ai/device.rb', line 134

def multi_gpu_config
  devs = all_devices
  if devs.length <= 1
    return { multi_gpu: false, device_ids: [0], strategy: :single }
  end

  # Check if devices are compatible (same compute capability)
  ccs = devs.map(&:compute_capability).uniq
  if ccs.length == 1
    { multi_gpu: true, device_ids: devs.map(&:id), strategy: :data_parallel }
  else
    # Heterogeneous GPUs — only use matching ones
    dominant_cc = devs.group_by(&:compute_capability).max_by { |_, v| v.length }[0]
    matching = devs.select { |d| d.compute_capability == dominant_cc }
    {
      multi_gpu: matching.length > 1,
      device_ids: matching.map(&:id),
      strategy: :data_parallel,
      warning: "Heterogeneous GPUs detected. Using #{matching.length} " \
               "devices with CC #{dominant_cc}."
    }
  end
end

.properties(device_id = 0) ⇒ DeviceProperties

Get properties for a specific device.

Parameters:

  • device_id (Integer) (defaults to: 0)

Returns:



31
32
33
# File 'lib/nnw/ai/device.rb', line 31

def properties(device_id = 0)
  all_devices[device_id] || raise("No GPU device #{device_id} found")
end

.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ Hash

Recommend optimal batch size and sequence length for a model.

Parameters:

  • model_params (Integer)

    total parameters

  • dtype_bytes (Integer) (defaults to: 4)

    bytes per parameter (2 for FP16, 4 for FP32)

  • device_id (Integer) (defaults to: 0)
  • target_utilization (Float) (defaults to: 0.85)

    fraction of VRAM to use (0.0-1.0)

Returns:

  • (Hash)

    :batch_size, :seq_len, :use_flash_attention, :use_gradient_checkpointing



72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/nnw/ai/device.rb', line 72

def recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85)
  dev = properties(device_id)
  available_bytes = (dev.total_memory_bytes * target_utilization).to_i

  # Weight memory
  weight_bytes = model_params * dtype_bytes

  # Optimizer state (Adam: 2x params for m, v)
  optimizer_bytes = model_params * 4 * 2

  # Gradient storage
  gradient_bytes = model_params * dtype_bytes

  # Fixed overhead
  fixed_bytes = weight_bytes + optimizer_bytes + gradient_bytes

  # Remaining for activations
  activation_budget = available_bytes - fixed_bytes

  if activation_budget <= 0
    return {
      batch_size: 1,
      seq_len: 128,
      use_flash_attention: true,
      use_gradient_checkpointing: true,
      warning: "Model too large for this GPU. Consider model parallelism or FP16."
    }
  end

  # Estimate activation memory per token per layer
  # Rough estimate: 4 * hidden_dim * dtype_bytes per token per layer
  # Hidden dim ~ sqrt(model_params / 12) for typical transformers
  estimated_hidden = Math.sqrt(model_params / 12.0).to_i
  estimated_layers = [model_params / (estimated_hidden * estimated_hidden * 12), 1].max
  activation_per_token = 4 * estimated_hidden * dtype_bytes * estimated_layers

  # Target: batch_size * seq_len * activation_per_token <= activation_budget
  total_tokens = activation_budget / [activation_per_token, 1].max

  # Prefer seq_len of 1024, adjust batch_size
  seq_len = [1024, total_tokens].min
  batch_size = [total_tokens / seq_len, 1].max

  # Flash attention saves O(N²) memory — worth it for seq_len > 512
  use_flash = seq_len > 512

  # Gradient checkpointing if we're tight on memory
  use_checkpointing = activation_budget < weight_bytes * 2

  {
    batch_size: batch_size.to_i,
    seq_len: seq_len.to_i,
    use_flash_attention: use_flash,
    use_gradient_checkpointing: use_checkpointing,
    estimated_vram_usage_gb: (fixed_bytes / (1024.0**3)).round(2),
    available_vram_gb: (available_bytes / (1024.0**3)).round(2),
    activation_budget_gb: (activation_budget / (1024.0**3)).round(2)
  }
end

.reset!void

This method returns an undefined value.

Clear cached device info (call after hardware changes).



160
161
162
# File 'lib/nnw/ai/device.rb', line 160

def reset!
  @all_devices = nil
end

.summaryString

Summary string for logging.

Returns:

  • (String)


57
58
59
60
61
62
63
64
# File 'lib/nnw/ai/device.rb', line 57

def summary
  lines = ["GPU Devices (#{count}):"]
  all_devices.each do |dev|
    lines << "  [#{dev.id}] #{dev.name} | #{dev.total_memory_gb}GB VRAM | " \
             "CC #{dev.compute_capability} | #{dev.sm_count} SMs"
  end
  lines.join("\n")
end

.total_memory(device_id = 0) ⇒ Integer

Total VRAM on device in bytes.

Parameters:

  • device_id (Integer) (defaults to: 0)

Returns:

  • (Integer)


38
39
40
# File 'lib/nnw/ai/device.rb', line 38

def total_memory(device_id = 0)
  properties(device_id).total_memory_bytes
end