Module: GPT2FFICuda

Defined in:: lib/toy/llm/engine/gpt2_fwd_engine_cuda.rb

Class Method Summary collapse

.forward(fwd_cache, token_ids) ⇒ Object

Run forward.
.make_pos_slice(model, t_seq) ⇒ Object

Build the (t_seq, d_model) pos_slice that pairs with token_ids padded to t_seq.
.pad_ids(ids, t_seq) ⇒ Object

Pad an Array<Int> of token IDs to length t_seq with zeros (the “<unk>” / EOS-style fallback).
.upload_from(fwd_cache, model, pos_slice_mat) ⇒ Object

Upload all weights from a populated GPT2LM into a freshly-realized GPT2FullForwardFFICacheCuda.

Class Method Details

.forward(fwd_cache, token_ids) ⇒ `Object`

Run forward. token_ids is a length-t_seq padded Array<Int>. Returns the (t_seq, vocab) logits Mat. ggml’s mul_mat result has ne=[vocab, t_seq] which, interpreted row-major with rows=t_seq / cols=vocab, is the layout Mat#flat[t*vocab + v] expects.

# File 'lib/toy/llm/engine/gpt2_fwd_engine_cuda.rb', line 353

def self.forward(fwd_cache, token_ids)
  TinyNNCuda.upload_int_array(fwd_cache.sess, fwd_cache.t_token_ids, token_ids)
  rc = TinyNNCuda.tnn_compute(fwd_cache.sess)
  if rc != 0
    puts "tnn_compute failed: rc=" + rc.to_s
  end
  TinyNNCuda.download_row_major(fwd_cache.sess, fwd_cache.t_logits, fwd_cache.t_seq, fwd_cache.vocab_size)
end

.make_pos_slice(model, t_seq) ⇒ `Object`

Build the (t_seq, d_model) pos_slice that pairs with token_ids padded to t_seq. Slice rows 0..t_seq-1 of model.pos_embed.

# File 'lib/toy/llm/engine/gpt2_fwd_engine_cuda.rb', line 322

def self.make_pos_slice(model, t_seq)
  out = Mat.new(t_seq, model.d_model)
  n = t_seq * model.d_model
  i = 0
  while i < n
    out.flat[i] = model.pos_embed.flat[i]
    i = i + 1
  end
  out
end

.pad_ids(ids, t_seq) ⇒ `Object`

Pad an Array<Int> of token IDs to length t_seq with zeros (the “<unk>” / EOS-style fallback). Returns a new Array.

# File 'lib/toy/llm/engine/gpt2_fwd_engine_cuda.rb', line 335

def self.pad_ids(ids, t_seq)
  out = Array.new(t_seq, 0)
  n   = ids.length
  if n > t_seq
    n = t_seq
  end
  i = 0
  while i < n
    out[i] = ids[i]
    i = i + 1
  end
  out
end

.upload_from(fwd_cache, model, pos_slice_mat) ⇒ `Object`

Upload all weights from a populated GPT2LM into a freshly-realized GPT2FullForwardFFICacheCuda. Transposed-upload for the per-head Q/K/V and for w_o/w_ff1/w_ff2; row-major bulk for token_embed/pos_slice; direct 1-D upload for biases and LayerNorm params.

# File 'lib/toy/llm/engine/gpt2_fwd_engine_cuda.rb', line 276

def self.upload_from(fwd_cache, model, pos_slice_mat)
  sess = fwd_cache.sess
  n    = fwd_cache.n_layers
  n_heads = fwd_cache.n_heads
  d_model = fwd_cache.d_model

  TinyNNCuda.upload_row_major(sess, fwd_cache.t_token_embed, model.token_embed)
  TinyNNCuda.upload_row_major(sess, fwd_cache.t_pos_slice,   pos_slice_mat)
  TinyNNCuda.tnn_upload_from_float_array(sess, fwd_cache.t_ln_f_gamma, model.ln_f_gamma, d_model)
  TinyNNCuda.tnn_upload_from_float_array(sess, fwd_cache.t_ln_f_beta,  model.ln_f_beta,  d_model)

  li = 0
  while li < n
    blk_n = model.gpt2_blocks[li]
    blk_f = fwd_cache.gpt2_blocks_ffi[li]

    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_ln1_gamma, blk_n.ln1_gamma, d_model)
    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_ln1_beta,  blk_n.ln1_beta,  d_model)
    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_ln2_gamma, blk_n.ln2_gamma, d_model)
    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_ln2_beta,  blk_n.ln2_beta,  d_model)

    d_head = fwd_cache.d_head
    h = 0
    while h < n_heads
      TinyNNCuda.stage_transposed_and_upload(sess, blk_f.t_w_q[h], blk_n.w_q[h])
      TinyNNCuda.stage_transposed_and_upload(sess, blk_f.t_w_k[h], blk_n.w_k[h])
      TinyNNCuda.stage_transposed_and_upload(sess, blk_f.t_w_v[h], blk_n.w_v[h])
      TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_b_q[h], blk_n.b_q[h], d_head)
      TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_b_k[h], blk_n.b_k[h], d_head)
      TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_b_v[h], blk_n.b_v[h], d_head)
      h = h + 1
    end

    TinyNNCuda.stage_transposed_and_upload(sess, blk_f.t_w_o,   blk_n.w_o)
    TinyNNCuda.stage_transposed_and_upload(sess, blk_f.t_w_ff1, blk_n.w_ff1)
    TinyNNCuda.stage_transposed_and_upload(sess, blk_f.t_w_ff2, blk_n.w_ff2)
    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_b_o,   blk_n.b_o,   d_model)
    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_b_ff1, blk_n.b_ff1, fwd_cache.d_ff)
    TinyNNCuda.tnn_upload_from_float_array(sess, blk_f.t_b_ff2, blk_n.b_ff2, d_model)

    li = li + 1
  end
end

Module: GPT2FFICuda

Class Method Summary collapse

Class Method Details

.forward(fwd_cache, token_ids) ⇒ Object

.make_pos_slice(model, t_seq) ⇒ Object

.pad_ids(ids, t_seq) ⇒ Object

.upload_from(fwd_cache, model, pos_slice_mat) ⇒ Object

.forward(fwd_cache, token_ids) ⇒ `Object`

.make_pos_slice(model, t_seq) ⇒ `Object`

.pad_ids(ids, t_seq) ⇒ `Object`

.upload_from(fwd_cache, model, pos_slice_mat) ⇒ `Object`