Class: Ignis::AI::Device

Inherits:

Object

Object
Ignis::AI::Device

show all

Defined in:: lib/nnw/ai/device.rb

Overview

Device — dynamic GPU capability detection and configuration.

Queries GPU properties at runtime: VRAM, compute capability, SM count, etc. All model configurations adapt based on the actual hardware present. No hardcoded GPU assumptions.

Defined Under Namespace

Classes: DeviceProperties

Class Method Summary collapse

.all_devices ⇒ Array<DeviceProperties>

Query all GPU devices and cache properties.
.count ⇒ Integer

Number of available GPUs.
.free_memory(device_id = 0) ⇒ Integer

Estimate free VRAM (queries cudaMemGetInfo).
.multi_gpu_config ⇒ Hash

Check if multi-GPU is available and worth using.
.properties(device_id = 0) ⇒ DeviceProperties

Get properties for a specific device.
.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ Hash

Recommend optimal batch size and sequence length for a model.
.reset! ⇒ void

Clear cached device info (call after hardware changes).
.summary ⇒ String

Summary string for logging.
.total_memory(device_id = 0) ⇒ Integer

Total VRAM on device in bytes.

Class Method Details

.all_devices ⇒ `Array<DeviceProperties>`

Query all GPU devices and cache properties.

Returns:

(Array<DeviceProperties>)



24
25
26

# File 'lib/nnw/ai/device.rb', line 24

def all_devices
  @all_devices ||= enumerate_devices
end

.count ⇒ `Integer`

Number of available GPUs.

Returns:

(Integer)



51
52
53

# File 'lib/nnw/ai/device.rb', line 51

def count
  all_devices.length
end

.free_memory(device_id = 0) ⇒ `Integer`

Estimate free VRAM (queries cudaMemGetInfo).

Parameters:

device_id (Integer) (defaults to: 0)

Returns:

(Integer) —

free bytes



45
46
47

# File 'lib/nnw/ai/device.rb', line 45

def free_memory(device_id = 0)
  query_free_memory(device_id)
end

.multi_gpu_config ⇒ `Hash`

Check if multi-GPU is available and worth using.

Returns:

(Hash) —

:multi_gpu, :device_ids, :strategy

# File 'lib/nnw/ai/device.rb', line 134

def multi_gpu_config
  devs = all_devices
  if devs.length <= 1
    return { multi_gpu: false, device_ids: [0], strategy: :single }
  end

  # Check if devices are compatible (same compute capability)
  ccs = devs.map(&:compute_capability).uniq
  if ccs.length == 1
    { multi_gpu: true, device_ids: devs.map(&:id), strategy: :data_parallel }
  else
    # Heterogeneous GPUs — only use matching ones
    dominant_cc = devs.group_by(&:compute_capability).max_by { |_, v| v.length }[0]
    matching = devs.select { |d| d.compute_capability == dominant_cc }
    {
      multi_gpu: matching.length > 1,
      device_ids: matching.map(&:id),
      strategy: :data_parallel,
      warning: "Heterogeneous GPUs detected. Using #{matching.length} " \
               "devices with CC #{dominant_cc}."
    }
  end
end

.properties(device_id = 0) ⇒ `DeviceProperties`

Get properties for a specific device.

Parameters:

device_id (Integer) (defaults to: 0)

Returns:

(DeviceProperties)



31
32
33

# File 'lib/nnw/ai/device.rb', line 31

def properties(device_id = 0)
  all_devices[device_id] || raise("No GPU device #{device_id} found")
end

.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ `Hash`

Recommend optimal batch size and sequence length for a model.

Parameters:

model_params (Integer) —

total parameters
dtype_bytes (Integer) (defaults to: 4) —

bytes per parameter (2 for FP16, 4 for FP32)
device_id (Integer) (defaults to: 0)
target_utilization (Float) (defaults to: 0.85) —

fraction of VRAM to use (0.0-1.0)

Returns:

(Hash) —

:batch_size, :seq_len, :use_flash_attention, :use_gradient_checkpointing

# File 'lib/nnw/ai/device.rb', line 72

def recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85)
  dev = properties(device_id)
  available_bytes = (dev.total_memory_bytes * target_utilization).to_i

  # Weight memory
  weight_bytes = model_params * dtype_bytes

  # Optimizer state (Adam: 2x params for m, v)
  optimizer_bytes = model_params * 4 * 2

  # Gradient storage
  gradient_bytes = model_params * dtype_bytes

  # Fixed overhead
  fixed_bytes = weight_bytes + optimizer_bytes + gradient_bytes

  # Remaining for activations
  activation_budget = available_bytes - fixed_bytes

  if activation_budget <= 0
    return {
      batch_size: 1,
      seq_len: 128,
      use_flash_attention: true,
      use_gradient_checkpointing: true,
      warning: "Model too large for this GPU. Consider model parallelism or FP16."
    }
  end

  # Estimate activation memory per token per layer
  # Rough estimate: 4 * hidden_dim * dtype_bytes per token per layer
  # Hidden dim ~ sqrt(model_params / 12) for typical transformers
  estimated_hidden = Math.sqrt(model_params / 12.0).to_i
  estimated_layers = [model_params / (estimated_hidden * estimated_hidden * 12), 1].max
  activation_per_token = 4 * estimated_hidden * dtype_bytes * estimated_layers

  # Target: batch_size * seq_len * activation_per_token <= activation_budget
  total_tokens = activation_budget / [activation_per_token, 1].max

  # Prefer seq_len of 1024, adjust batch_size
  seq_len = [1024, total_tokens].min
  batch_size = [total_tokens / seq_len, 1].max

  # Flash attention saves O(N²) memory — worth it for seq_len > 512
  use_flash = seq_len > 512

  # Gradient checkpointing if we're tight on memory
  use_checkpointing = activation_budget < weight_bytes * 2

  {
    batch_size: batch_size.to_i,
    seq_len: seq_len.to_i,
    use_flash_attention: use_flash,
    use_gradient_checkpointing: use_checkpointing,
    estimated_vram_usage_gb: (fixed_bytes / (1024.0**3)).round(2),
    available_vram_gb: (available_bytes / (1024.0**3)).round(2),
    activation_budget_gb: (activation_budget / (1024.0**3)).round(2)
  }
end

.reset! ⇒ `void`

This method returns an undefined value.

Clear cached device info (call after hardware changes).



160
161
162

# File 'lib/nnw/ai/device.rb', line 160

def reset!
  @all_devices = nil
end

.summary ⇒ `String`

Summary string for logging.

Returns:

(String)

# File 'lib/nnw/ai/device.rb', line 57

def summary
  lines = ["GPU Devices (#{count}):"]
  all_devices.each do |dev|
    lines << "  [#{dev.id}] #{dev.name} | #{dev.total_memory_gb}GB VRAM | " \
             "CC #{dev.compute_capability} | #{dev.sm_count} SMs"
  end
  lines.join("\n")
end

.total_memory(device_id = 0) ⇒ `Integer`

Total VRAM on device in bytes.

Parameters:

device_id (Integer) (defaults to: 0)

Returns:

(Integer)



38
39
40

# File 'lib/nnw/ai/device.rb', line 38

def total_memory(device_id = 0)
  properties(device_id).total_memory_bytes
end

Class: Ignis::AI::Device

Overview

Defined Under Namespace

Class Method Summary collapse

Class Method Details

.all_devices ⇒ Array<DeviceProperties>

.count ⇒ Integer

.free_memory(device_id = 0) ⇒ Integer

.multi_gpu_config ⇒ Hash

.properties(device_id = 0) ⇒ DeviceProperties

.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ Hash

.reset! ⇒ void

.summary ⇒ String

.total_memory(device_id = 0) ⇒ Integer

.all_devices ⇒ `Array<DeviceProperties>`

.count ⇒ `Integer`

.free_memory(device_id = 0) ⇒ `Integer`

.multi_gpu_config ⇒ `Hash`

.properties(device_id = 0) ⇒ `DeviceProperties`

.recommend_config(model_params, dtype_bytes: 4, device_id: 0, target_utilization: 0.85) ⇒ `Hash`

.reset! ⇒ `void`

.summary ⇒ `String`

.total_memory(device_id = 0) ⇒ `Integer`