Class: Ignis::Collective::Topology::Detector

Inherits:
Object
  • Object
show all
Defined in:
lib/nvruby/collective/topology.rb

Overview

GPU topology detector - main entry point

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(device_ids: nil) ⇒ Detector

Detect topology for specified GPUs

Parameters:

  • device_ids (Array<Integer>, nil) (defaults to: nil)

    GPU IDs or nil for all GPUs



263
264
265
266
# File 'lib/nvruby/collective/topology.rb', line 263

def initialize(device_ids: nil)
  @device_ids = device_ids || all_device_ids
  @matrix = Matrix.new(@device_ids)
end

Instance Attribute Details

#matrixMatrix (readonly)

Returns Current topology matrix.

Returns:

  • (Matrix)

    Current topology matrix



259
260
261
# File 'lib/nvruby/collective/topology.rb', line 259

def matrix
  @matrix
end

Instance Method Details

#all_device_idsArray<Integer>

Returns All visible GPU device IDs.

Returns:

  • (Array<Integer>)

    All visible GPU device IDs



269
270
271
# File 'lib/nvruby/collective/topology.rb', line 269

def all_device_ids
  CUDA::Device.list.map(&:index)
end

#enable_all_p2p!Hash<Array<Integer>, Boolean>

Enable P2P access between all GPUs in the topology

Returns:

  • (Hash<Array<Integer>, Boolean>)

    Map of [src, dst] to success



313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
# File 'lib/nvruby/collective/topology.rb', line 313

def enable_all_p2p!
  results = {}

  @matrix.p2p_paths.each do |path|
    src = path.src_device
    dst = path.dst_device

    # Set source device context
    status = CUDA::RuntimeAPI.cudaSetDevice(src)
    CUDA::RuntimeAPI.check_status!(status, "Set device #{src}")

    # Enable peer access
    status = P2PBindings.cudaDeviceEnablePeerAccess(dst, 0)

    # Status 0 = success, 704 = already enabled
    results[[src, dst]] = status.zero? || status == 704
  end

  results
end

#gpu_countInteger

Returns Number of GPUs in this topology.

Returns:

  • (Integer)

    Number of GPUs in this topology



274
275
276
# File 'lib/nvruby/collective/topology.rb', line 274

def gpu_count
  @device_ids.size
end

#interconnect_type(device_a, device_b) ⇒ Symbol

Get interconnect type between two GPUs

Parameters:

  • device_a (Integer)

    First GPU

  • device_b (Integer)

    Second GPU

Returns:

  • (Symbol)

    Interconnect type



282
283
284
285
# File 'lib/nvruby/collective/topology.rb', line 282

def interconnect_type(device_a, device_b)
  path = @matrix.path(device_a, device_b)
  path&.interconnect_type || :none
end

Check if specific GPU pair has NVLink

Parameters:

  • device_a (Integer)

    First GPU

  • device_b (Integer)

    Second GPU

Returns:

  • (Boolean)

    True if NVLink connected



297
298
299
300
# File 'lib/nvruby/collective/topology.rb', line 297

def nvlink_connected?(device_a, device_b)
  path = @matrix.path(device_a, device_b)
  path&.interconnect_type == :nvlink
end

#optimal_ring_orderArray<Integer>

Get optimal ring order for collective operations

Returns:

  • (Array<Integer>)

    Ordered GPU IDs



289
290
291
# File 'lib/nvruby/collective/topology.rb', line 289

def optimal_ring_order
  @matrix.optimal_ring_order
end

#p2p_available?(device_a, device_b) ⇒ Boolean

Check if P2P is available between GPUs

Parameters:

  • device_a (Integer)

    First GPU

  • device_b (Integer)

    Second GPU

Returns:

  • (Boolean)

    True if P2P available



306
307
308
309
# File 'lib/nvruby/collective/topology.rb', line 306

def p2p_available?(device_a, device_b)
  path = @matrix.path(device_a, device_b)
  path&.p2p_supported || false
end

#to_sString

Returns Summary of detected topology.

Returns:

  • (String)

    Summary of detected topology



335
336
337
338
339
340
341
342
343
# File 'lib/nvruby/collective/topology.rb', line 335

def to_s
  nvlink_count = @matrix.nvlink_paths.size
  p2p_count = @matrix.p2p_paths.size
  total_pairs = @device_ids.size * (@device_ids.size - 1)

  "Topology: #{@device_ids.size} GPUs, " \
    "#{nvlink_count}/#{total_pairs} NVLink, " \
    "#{p2p_count}/#{total_pairs} P2P"
end