Class: Ignis::Collective::Topology::Matrix

Inherits:
Object
  • Object
show all
Defined in:
lib/nvruby/collective/topology.rb

Overview

Topology matrix for a set of GPUs

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(device_ids) ⇒ Matrix

Returns a new instance of Matrix.

Parameters:

  • device_ids (Array<Integer>)

    GPU device IDs to analyze



103
104
105
106
107
# File 'lib/nvruby/collective/topology.rb', line 103

def initialize(device_ids)
  @device_ids = device_ids.dup.freeze
  @paths = {}
  build_matrix!
end

Instance Attribute Details

#device_idsArray<Integer> (readonly)

Returns List of GPU device IDs.

Returns:

  • (Array<Integer>)

    List of GPU device IDs



97
98
99
# File 'lib/nvruby/collective/topology.rb', line 97

def device_ids
  @device_ids
end

#pathsHash<Array<Integer>, Path> (readonly)

Returns Map of [src, dst] to Path.

Returns:

  • (Hash<Array<Integer>, Path>)

    Map of [src, dst] to Path



100
101
102
# File 'lib/nvruby/collective/topology.rb', line 100

def paths
  @paths
end

Instance Method Details

#full_p2p_mesh?Boolean

Check if all GPUs have full P2P mesh

Returns:

  • (Boolean)

    True if all pairs have P2P



157
158
159
# File 'lib/nvruby/collective/topology.rb', line 157

def full_p2p_mesh?
  @paths.values.all?(&:p2p_supported)
end

Get all paths with NVLink connectivity

Returns:

  • (Array<Path>)

    Paths with NVLink



145
146
147
# File 'lib/nvruby/collective/topology.rb', line 145

def nvlink_paths
  @paths.values.select { |p| p.interconnect_type == :nvlink }
end

#optimal_ring_orderArray<Integer>

Get optimal ring order based on topology Minimizes total latency by placing NVLink-connected GPUs adjacent

Returns:

  • (Array<Integer>)

    Ordered device IDs for ring algorithm



122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# File 'lib/nvruby/collective/topology.rb', line 122

def optimal_ring_order
  return @device_ids.dup if @device_ids.size <= 2

  # Greedy nearest-neighbor heuristic
  remaining = @device_ids.dup
  order = [remaining.shift]

  until remaining.empty?
    current = order.last
    # Find GPU with best connection to current
    best_next = remaining.min_by do |gpu|
      path_obj = path(current, gpu)
      path_obj ? path_obj.performance_rank : Float::INFINITY
    end
    order << best_next
    remaining.delete(best_next)
  end

  order
end

#p2p_pathsArray<Path>

Get all paths with P2P support

Returns:

  • (Array<Path>)

    Paths with P2P



151
152
153
# File 'lib/nvruby/collective/topology.rb', line 151

def p2p_paths
  @paths.values.select(&:p2p_supported)
end

#path(src, dst) ⇒ Path?

Get path between two GPUs

Parameters:

  • src (Integer)

    Source GPU

  • dst (Integer)

    Destination GPU

Returns:

  • (Path, nil)

    Path object or nil if same device



113
114
115
116
117
# File 'lib/nvruby/collective/topology.rb', line 113

def path(src, dst)
  return nil if src == dst

  @paths[[src, dst]]
end

#to_sString

Returns Human-readable matrix representation.

Returns:

  • (String)

    Human-readable matrix representation



162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/nvruby/collective/topology.rb', line 162

def to_s
  header = "Topology Matrix (#{@device_ids.size} GPUs)\n"
  rows = @device_ids.map do |src|
    cols = @device_ids.map do |dst|
      if src == dst
        "  -  "
      else
        path_obj = path(src, dst)
        type_abbr = path_obj.interconnect_type.to_s[0..3].upcase
        "#{type_abbr.ljust(5)}"
      end
    end
    "GPU#{src}: #{cols.join(' | ')}"
  end
  header + rows.join("\n")
end