Class: Toy::LLM::Recipes::FromScratchCuda

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/llm/recipes/from_scratch_cuda.rb

Overview

The from-scratch random-init training recipe. Encapsulates the existing loop: realize! builds the random-init forward+CE+backward+ AdamW graph on a Toy::LLM::Engine::LlamaSeqEngineCuda (random-init realize self-enables full_finetune + train_embeddings, so no extra enable_* call is needed), then step! drives one training step. The caller (fixture) owns the experiment config and the per-step input Mats.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeFromScratchCuda

Returns a new instance of FromScratchCuda.



55
56
57
58
59
60
61
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 55

def initialize
  @fs_cache      = Toy::LLM::Engine::LlamaSeqEngineCuda.new
  @fs_t_loss     = nil
  @fs_t_labels   = nil
  @fs_t_hp       = nil
  @fs_step_index = 0
end

Instance Attribute Details

#fs_cacheObject

Returns the value of attribute fs_cache.



53
54
55
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 53

def fs_cache
  @fs_cache
end

#fs_step_indexObject

Returns the value of attribute fs_step_index.



53
54
55
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 53

def fs_step_index
  @fs_step_index
end

#fs_t_hpObject

Returns the value of attribute fs_t_hp.



53
54
55
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 53

def fs_t_hp
  @fs_t_hp
end

#fs_t_labelsObject

Returns the value of attribute fs_t_labels.



53
54
55
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 53

def fs_t_labels
  @fs_t_labels
end

#fs_t_lossObject

Returns the value of attribute fs_t_loss.



53
54
55
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 53

def fs_t_loss
  @fs_t_loss
end

Instance Method Details

#realize!(cfg, opts) ⇒ Object

Realize the random-init graph. Delegates VERBATIM to the cache: realize_for_random_init (which self-enables @ft_train_embeddings_enabled + @seq_full_finetune_enabled) then build_training_step (forward + CE + backward + opt_step_adamw baked into the ggml graph). Stashes the returned [t_loss, t_labels, t_hp] triple. ‘opts` is a Toy::LLM::RecipeOptions (toy#64 item 1) carrying the former 7 trailing positional args (t_seq, t_batch, weight_dtype, untied, qkv_bias, seed, init_scale) — unpacked here in the engine’s exact positional order, so the realize is byte-identical. Returns nil.



72
73
74
75
76
77
78
79
80
81
82
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 72

def realize!(cfg, opts)
  @fs_cache.realize_for_random_init(cfg, opts.t_seq, opts.t_batch,
                                    opts.weight_dtype, opts.untied,
                                    opts.qkv_bias, opts.seed,
                                    opts.init_scale)
  result       = @fs_cache.build_training_step
  @fs_t_loss   = result[0]
  @fs_t_labels = result[1]
  @fs_t_hp     = result[2]
  nil
end

#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object

ONE training step. Op order is VERBATIM from smoke_projection_lens.rb:97-112: graph_reset on the first step else reset_grads_only; the four uploads in order (token_ids/positions/labels/hp); compute_backward; download_row_major(t_loss, 1, 1). is_first selects the reset; the NOT used for the reset decision, so the caller stays in full control of the step==0 branch (matches the gate’s step==0 branch exactly). Returns the loss Float. Per-step input Mats are built by the caller.



93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/toy/llm/recipes/from_scratch_cuda.rb', line 93

def step!(seq_ids, positions, m_labels, m_hp, is_first)
  s = @fs_cache.sess
  if is_first
    TinyNNCuda.tnn_graph_reset(s)
  else
    TinyNNCuda.tnn_graph_reset_grads_only(s)
  end
  TinyNNCuda.upload_int_array(s, @fs_cache.t_seq_token_ids, seq_ids)
  TinyNNCuda.upload_int_array(s, @fs_cache.t_seq_positions, positions)
  TinyNNCuda.upload_row_major(s, @fs_t_labels, m_labels)
  TinyNNCuda.upload_row_major(s, @fs_t_hp,     m_hp)
  TinyNNCuda.tnn_compute_backward(s)
  loss_mat = TinyNNCuda.download_row_major(s, @fs_t_loss, 1, 1)
  loss_mat.flat[0]
end