Class: Ignis::Collective::Transport::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/nvruby/collective/transport/base.rb

Overview

Abstract base class for all transport implementations Each transport handles GPU-to-GPU or GPU-to-network data movement

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(src_device:, dst_device:) ⇒ Base

Returns a new instance of Base.

Parameters:

  • src_device (Integer)

    Source GPU device ID

  • dst_device (Integer)

    Destination GPU device ID



23
24
25
26
27
# File 'lib/nvruby/collective/transport/base.rb', line 23

def initialize(src_device:, dst_device:)
  @src_device = src_device
  @dst_device = dst_device
  @initialized = false
end

Instance Attribute Details

#dst_deviceInteger (readonly)

Returns Destination device ID.

Returns:

  • (Integer)

    Destination device ID



19
20
21
# File 'lib/nvruby/collective/transport/base.rb', line 19

def dst_device
  @dst_device
end

#src_deviceInteger (readonly)

Returns Source device ID.

Returns:

  • (Integer)

    Source device ID



16
17
18
# File 'lib/nvruby/collective/transport/base.rb', line 16

def src_device
  @src_device
end

Class Method Details

.available?(src, dst) ⇒ Boolean

Check if this transport is available for the given GPU pair

Parameters:

  • src (Integer)

    Source GPU

  • dst (Integer)

    Destination GPU

Returns:

  • (Boolean)

    True if available

Raises:

  • (NotImplementedError)


103
104
105
# File 'lib/nvruby/collective/transport/base.rb', line 103

def self.available?(src, dst)
  raise NotImplementedError
end

.transport_typeSymbol

Transport type identifier

Returns:

  • (Symbol)

    Transport type

Raises:

  • (NotImplementedError)


11
12
13
# File 'lib/nvruby/collective/transport/base.rb', line 11

def self.transport_type
  raise NotImplementedError, "Subclass must define transport_type"
end

Instance Method Details

#destroy!void

This method returns an undefined value.

Clean up resources



109
110
111
# File 'lib/nvruby/collective/transport/base.rb', line 109

def destroy!
  @initialized = false
end

#estimated_bandwidthFloat

Estimated bandwidth in GB/s

Returns:

  • (Float)

    Bandwidth estimate

Raises:

  • (NotImplementedError)


89
90
91
# File 'lib/nvruby/collective/transport/base.rb', line 89

def estimated_bandwidth
  raise NotImplementedError
end

#estimated_latencyFloat

Estimated latency in microseconds

Returns:

  • (Float)

    Latency estimate

Raises:

  • (NotImplementedError)


95
96
97
# File 'lib/nvruby/collective/transport/base.rb', line 95

def estimated_latency
  raise NotImplementedError
end

#initialize!void

This method returns an undefined value.

Initialize the transport (called once per communicator)

Raises:

  • (NotImplementedError)


31
32
33
# File 'lib/nvruby/collective/transport/base.rb', line 31

def initialize!
  raise NotImplementedError
end

#ready?Boolean

Check if transport is initialized and ready

Returns:

  • (Boolean)

    True if ready for use



37
38
39
# File 'lib/nvruby/collective/transport/base.rb', line 37

def ready?
  @initialized
end

#recv_async(buffer, size, stream) ⇒ void

This method returns an undefined value.

Receive data asynchronously

Parameters:

  • buffer (FFI::Pointer)

    Device pointer to receive into

  • size (Integer)

    Bytes to receive

  • stream (CUDA::Stream, FFI::Pointer)

    CUDA stream for async execution

Raises:

  • (NotImplementedError)


55
56
57
# File 'lib/nvruby/collective/transport/base.rb', line 55

def recv_async(buffer, size, stream)
  raise NotImplementedError
end

#recv_sync(buffer, size) ⇒ void

This method returns an undefined value.

Synchronous receive (waits for completion)

Parameters:

  • buffer (FFI::Pointer)

    Device pointer to receive into

  • size (Integer)

    Bytes to receive



73
74
75
76
77
# File 'lib/nvruby/collective/transport/base.rb', line 73

def recv_sync(buffer, size)
  null_stream = FFI::Pointer::NULL
  recv_async(buffer, size, null_stream)
  synchronize!
end

#send_async(buffer, size, stream) ⇒ void

This method returns an undefined value.

Send data asynchronously

Parameters:

  • buffer (FFI::Pointer)

    Device pointer to send

  • size (Integer)

    Bytes to send

  • stream (CUDA::Stream, FFI::Pointer)

    CUDA stream for async execution

Raises:

  • (NotImplementedError)


46
47
48
# File 'lib/nvruby/collective/transport/base.rb', line 46

def send_async(buffer, size, stream)
  raise NotImplementedError
end

#send_sync(buffer, size) ⇒ void

This method returns an undefined value.

Synchronous send (waits for completion)

Parameters:

  • buffer (FFI::Pointer)

    Device pointer to send

  • size (Integer)

    Bytes to send



63
64
65
66
67
# File 'lib/nvruby/collective/transport/base.rb', line 63

def send_sync(buffer, size)
  null_stream = FFI::Pointer::NULL
  send_async(buffer, size, null_stream)
  synchronize!
end

#synchronize!void

This method returns an undefined value.

Wait for all pending operations to complete



81
82
83
84
85
# File 'lib/nvruby/collective/transport/base.rb', line 81

def synchronize!
  CUDA::RuntimeAPI.ensure_loaded!
  status = CUDA::RuntimeAPI.cudaDeviceSynchronize
  CUDA::RuntimeAPI.check_status!(status, "Transport synchronize")
end

#to_sString

Returns Human-readable description.

Returns:

  • (String)

    Human-readable description



114
115
116
# File 'lib/nvruby/collective/transport/base.rb', line 114

def to_s
  "#{self.class.transport_type}[#{@src_device}#{@dst_device}]"
end