Class: Ignis::Collective::Topology::Detector
- Inherits:
-
Object
- Object
- Ignis::Collective::Topology::Detector
- Defined in:
- lib/nvruby/collective/topology.rb
Overview
GPU topology detector - main entry point
Instance Attribute Summary collapse
-
#matrix ⇒ Matrix
readonly
Current topology matrix.
Instance Method Summary collapse
-
#all_device_ids ⇒ Array<Integer>
All visible GPU device IDs.
-
#enable_all_p2p! ⇒ Hash<Array<Integer>, Boolean>
Enable P2P access between all GPUs in the topology.
-
#gpu_count ⇒ Integer
Number of GPUs in this topology.
-
#initialize(device_ids: nil) ⇒ Detector
constructor
Detect topology for specified GPUs.
-
#interconnect_type(device_a, device_b) ⇒ Symbol
Get interconnect type between two GPUs.
-
#nvlink_connected?(device_a, device_b) ⇒ Boolean
Check if specific GPU pair has NVLink.
-
#optimal_ring_order ⇒ Array<Integer>
Get optimal ring order for collective operations.
-
#p2p_available?(device_a, device_b) ⇒ Boolean
Check if P2P is available between GPUs.
-
#to_s ⇒ String
Summary of detected topology.
Constructor Details
Instance Attribute Details
#matrix ⇒ Matrix (readonly)
Returns Current topology matrix.
259 260 261 |
# File 'lib/nvruby/collective/topology.rb', line 259 def matrix @matrix end |
Instance Method Details
#all_device_ids ⇒ Array<Integer>
Returns All visible GPU device IDs.
269 270 271 |
# File 'lib/nvruby/collective/topology.rb', line 269 def all_device_ids CUDA::Device.list.map(&:index) end |
#enable_all_p2p! ⇒ Hash<Array<Integer>, Boolean>
Enable P2P access between all GPUs in the topology
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
# File 'lib/nvruby/collective/topology.rb', line 313 def enable_all_p2p! results = {} @matrix.p2p_paths.each do |path| src = path.src_device dst = path.dst_device # Set source device context status = CUDA::RuntimeAPI.cudaSetDevice(src) CUDA::RuntimeAPI.check_status!(status, "Set device #{src}") # Enable peer access status = P2PBindings.cudaDeviceEnablePeerAccess(dst, 0) # Status 0 = success, 704 = already enabled results[[src, dst]] = status.zero? || status == 704 end results end |
#gpu_count ⇒ Integer
Returns Number of GPUs in this topology.
274 275 276 |
# File 'lib/nvruby/collective/topology.rb', line 274 def gpu_count @device_ids.size end |
#interconnect_type(device_a, device_b) ⇒ Symbol
Get interconnect type between two GPUs
282 283 284 285 |
# File 'lib/nvruby/collective/topology.rb', line 282 def interconnect_type(device_a, device_b) path = @matrix.path(device_a, device_b) path&.interconnect_type || :none end |
#nvlink_connected?(device_a, device_b) ⇒ Boolean
Check if specific GPU pair has NVLink
297 298 299 300 |
# File 'lib/nvruby/collective/topology.rb', line 297 def nvlink_connected?(device_a, device_b) path = @matrix.path(device_a, device_b) path&.interconnect_type == :nvlink end |
#optimal_ring_order ⇒ Array<Integer>
Get optimal ring order for collective operations
289 290 291 |
# File 'lib/nvruby/collective/topology.rb', line 289 def optimal_ring_order @matrix.optimal_ring_order end |
#p2p_available?(device_a, device_b) ⇒ Boolean
Check if P2P is available between GPUs
306 307 308 309 |
# File 'lib/nvruby/collective/topology.rb', line 306 def p2p_available?(device_a, device_b) path = @matrix.path(device_a, device_b) path&.p2p_supported || false end |
#to_s ⇒ String
Returns Summary of detected topology.
335 336 337 338 339 340 341 342 343 |
# File 'lib/nvruby/collective/topology.rb', line 335 def to_s nvlink_count = @matrix.nvlink_paths.size p2p_count = @matrix.p2p_paths.size total_pairs = @device_ids.size * (@device_ids.size - 1) "Topology: #{@device_ids.size} GPUs, " \ "#{nvlink_count}/#{total_pairs} NVLink, " \ "#{p2p_count}/#{total_pairs} P2P" end |