Class: Ignis::Collective::Transport::InfiniBandTransport

Inherits:
Base
  • Object
show all
Defined in:
lib/nvruby/collective/transport/rdma_transports.rb

Overview

Note:

Requires InfiniBand HCA (Host Channel Adapter) hardware

Note:

Production implementation requires ibverbs library

InfiniBand Transport Interface High-speed network transport for HPC clusters

This is an interface definition for InfiniBand transport. The actual implementation requires specialized hardware and the libibverbs library.

When hardware is available, this transport provides:

  • 100-400 Gbps bandwidth

  • RDMA (Remote Direct Memory Access)

  • Kernel bypass

  • GPUDirect RDMA (Linux only)

Instance Attribute Summary

Attributes inherited from Base

#dst_device, #src_device

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#estimated_latency, #recv_async, #recv_sync, #send_async, #send_sync, #synchronize!

Constructor Details

#initialize(local_lid:, remote_lid:, local_qpn:, remote_qpn:) ⇒ InfiniBandTransport

Returns a new instance of InfiniBandTransport.

Parameters:

  • local_lid (Integer)

    Local LID (Local Identifier)

  • remote_lid (Integer)

    Remote LID

  • local_qpn (Integer)

    Local Queue Pair Number

  • remote_qpn (Integer)

    Remote Queue Pair Number

Raises:



41
42
43
44
45
46
47
48
49
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 41

def initialize(local_lid:, remote_lid:, local_qpn:, remote_qpn:)
  super(src_device: 0, dst_device: 0)
  @local_lid = local_lid
  @remote_lid = remote_lid
  @local_qpn = local_qpn
  @remote_qpn = remote_qpn
  @initialized = false
  raise TransportError, "InfiniBand hardware not available" unless self.class.available?
end

Class Method Details

.available?Boolean

Check if InfiniBand is available

Returns:

  • (Boolean)


32
33
34
35
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 32

def self.available?
  # Check for InfiniBand hardware
  check_ib_hardware
end

.check_ib_hardwareObject



119
120
121
122
123
124
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 119

def self.check_ib_hardware
  # Check for InfiniBand devices
  # On Linux: ls /sys/class/infiniband/
  # On Windows: Check for Mellanox WinOF driver
  false  # InfiniBand not available by default
end

.transport_typeSymbol

Returns Transport type.

Returns:

  • (Symbol)

    Transport type



26
27
28
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 26

def self.transport_type
  :infiniband
end

Instance Method Details

#destroy!void

This method returns an undefined value.

Clean up resources



103
104
105
106
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 103

def destroy!
  # Would destroy QP, CQ, PD, close device
  @initialized = false
end

#estimated_bandwidthFloat

Estimated bandwidth in GB/s

Returns:

  • (Float)


97
98
99
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 97

def estimated_bandwidth
  50.0  # 400 Gbps HDR InfiniBand
end

#initialize!void

This method returns an undefined value.

Initialize InfiniBand transport

Raises:

  • (NotImplementedError)


53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 53

def initialize!
  return if @initialized

  # Would initialize:
  # 1. Open IB device (ibv_open_device)
  # 2. Allocate protection domain (ibv_alloc_pd)
  # 3. Create completion queue (ibv_create_cq)
  # 4. Create queue pair (ibv_create_qp)
  # 5. Transition QP to RTS state
  # 6. Exchange QP info with remote

  raise NotImplementedError, "InfiniBand transport requires specialized hardware"
end

#ready?Boolean

Check if ready

Returns:

  • (Boolean)


69
70
71
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 69

def ready?
  @initialized
end

#recv(dst_ptr, size, stream: nil) ⇒ Integer

Receive data via RDMA

Parameters:

  • dst_ptr (FFI::Pointer)

    Destination buffer

  • size (Integer)

    Size in bytes

  • stream (FFI::Pointer, nil) (defaults to: nil)

    CUDA stream

Returns:

  • (Integer)

Raises:

  • (NotImplementedError)


89
90
91
92
93
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 89

def recv(dst_ptr, size, stream: nil)
  ensure_initialized!
  # Would use ibv_post_recv
  raise NotImplementedError, "InfiniBand RDMA recv not implemented"
end

#send(src_ptr, size, stream: nil) ⇒ Boolean

Send data via RDMA

Parameters:

  • src_ptr (FFI::Pointer)

    Source buffer

  • size (Integer)

    Size in bytes

  • stream (FFI::Pointer, nil) (defaults to: nil)

    CUDA stream

Returns:

  • (Boolean)

Raises:

  • (NotImplementedError)


78
79
80
81
82
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 78

def send(src_ptr, size, stream: nil)
  ensure_initialized!
  # Would use ibv_post_send with IBV_WR_RDMA_WRITE
  raise NotImplementedError, "InfiniBand RDMA send not implemented"
end

#to_sString

Returns:

  • (String)


109
110
111
# File 'lib/nvruby/collective/transport/rdma_transports.rb', line 109

def to_s
  "InfiniBandTransport[LID #{@local_lid} <-> #{@remote_lid}]"
end