Class: Toy::LLM::Recipes::WarmStartCuda
- Inherits:
-
Object
- Object
- Toy::LLM::Recipes::WarmStartCuda
- Defined in:
- lib/toy/llm/recipes/warm_start_cuda.rb
Overview
The warm-start training recipe. realize_scratch! builds the random-init forward+CE+backward+AdamW graph on a Toy::LLM::Engine::LlamaSeqEngineCuda (random-init realize self-enables full_finetune + train_embeddings, so no extra enable_* call is needed) and OPENS the warm window; realize_warm! (optional) uploads an already-read donor embedding into the realize’d embed table BEFORE the graph is baked; build! CLOSES the window by baking forward+CE+backward+opt_step_adamw into the ggml graph. step! then drives one training step. The caller (fixture) owns the experiment config, the donor/PCA GGUF read, the corpus stream, the LR schedule, and the per-step input Mats.
Instance Attribute Summary collapse
-
#ws_cache ⇒ Object
Returns the value of attribute ws_cache.
-
#ws_step_index ⇒ Object
Returns the value of attribute ws_step_index.
-
#ws_t_hp ⇒ Object
Returns the value of attribute ws_t_hp.
-
#ws_t_labels ⇒ Object
Returns the value of attribute ws_t_labels.
-
#ws_t_loss ⇒ Object
Returns the value of attribute ws_t_loss.
Class Method Summary collapse
-
.donor_embed_width(donor_gguf_path) ⇒ Object
Read the donor’s embedding width (llama.embedding_length) from a GGUF path — the value the caller must put in cfg.donor_d_in BEFORE realize_scratch! (the projection lens is sized donor_d_in x d_model at realize time, so the recipe cannot learn it later).
Instance Method Summary collapse
-
#build! ⇒ Object
CLOSE the warm window: bake forward + CE + backward + opt_step_adamw into the ggml graph (no Ruby Trainer/optimizer — same rationale as FromScratch).
-
#initialize ⇒ WarmStartCuda
constructor
A new instance of WarmStartCuda.
-
#realize_scratch!(cfg, opts) ⇒ Object
Realize the random-init graph and OPEN the warm window.
-
#realize_warm!(donor_gguf_path, cfg) ⇒ Object
OPTIONAL: warm the realize’d embed table from a donor GGUF.
-
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object
ONE training step.
-
#upload_donor!(donor_buf_flat, n_floats) ⇒ Object
The raw upload MECHANISM realize_warm! rides (and the seam for already-read buffers — e.g. the legacy PCA-lens flow): one tnn_upload_from_float_array into the realize’d token_embed table (mirrors 09 L180).
Constructor Details
#initialize ⇒ WarmStartCuda
Returns a new instance of WarmStartCuda.
75 76 77 78 79 80 81 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 75 def initialize @ws_cache = Toy::LLM::Engine::LlamaSeqEngineCuda.new @ws_t_loss = nil @ws_t_labels = nil @ws_t_hp = nil @ws_step_index = 0 end |
Instance Attribute Details
#ws_cache ⇒ Object
Returns the value of attribute ws_cache.
73 74 75 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73 def ws_cache @ws_cache end |
#ws_step_index ⇒ Object
Returns the value of attribute ws_step_index.
73 74 75 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73 def ws_step_index @ws_step_index end |
#ws_t_hp ⇒ Object
Returns the value of attribute ws_t_hp.
73 74 75 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73 def ws_t_hp @ws_t_hp end |
#ws_t_labels ⇒ Object
Returns the value of attribute ws_t_labels.
73 74 75 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73 def ws_t_labels @ws_t_labels end |
#ws_t_loss ⇒ Object
Returns the value of attribute ws_t_loss.
73 74 75 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73 def ws_t_loss @ws_t_loss end |
Class Method Details
.donor_embed_width(donor_gguf_path) ⇒ Object
Read the donor’s embedding width (llama.embedding_length) from a GGUF path — the value the caller must put in cfg.donor_d_in BEFORE realize_scratch! (the projection lens is sized donor_d_in x d_model at realize time, so the recipe cannot learn it later). FAILS LOUD on a missing/corrupt donor or a non-llama-family GGUF. (toy#73 item 4 — the read half of the donor plumbing realize_warm! owns.)
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 107 def self.(donor_gguf_path) if !File.exist?(donor_gguf_path) raise "WarmStartCuda.donor_embed_width: donor GGUF not found: " + donor_gguf_path end ggh = TinyNNCuda.tnn_gguf_load(donor_gguf_path) if ggh == nil || ggh == TinyNNCuda.tnn_null_ptr raise "WarmStartCuda.donor_embed_width: failed to open " + donor_gguf_path + " (not a GGUF?)" end donor_d = TinyNNCuda.tnn_gguf_get_u32(ggh, "llama.embedding_length") TinyNNCuda.tnn_gguf_free(ggh) if donor_d <= 0 raise "WarmStartCuda.donor_embed_width: donor has no " + "llama.embedding_length key — not llama-family? (" + donor_gguf_path + ")" end donor_d end |
Instance Method Details
#build! ⇒ Object
CLOSE the warm window: bake forward + CE + backward + opt_step_adamw into the ggml graph (no Ruby Trainer/optimizer —same rationale as FromScratch). Delegates VERBATIM to build_training_step and stashes the returned [t_loss, t_labels, t_hp] triple. Returns nil.
203 204 205 206 207 208 209 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 203 def build! result = @ws_cache.build_training_step @ws_t_loss = result[0] @ws_t_labels = result[1] @ws_t_hp = result[2] nil end |
#realize_scratch!(cfg, opts) ⇒ Object
Realize the random-init graph and OPEN the warm window. Delegates VERBATIM to the cache: realize_for_random_init (which self-enables ‘opts` is a Toy::LLM::RecipeOptions (toy#64 item 1) carrying the former 7 trailing positional args, unpacked here in the engine’s exact positional order (identical to FromScratch#realize! / 09 L138), so the realize is byte-identical. Does NOT bake the graph —that is build!‘s job, leaving the window open for an optional realize_warm! upload in between. Returns nil.
92 93 94 95 96 97 98 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 92 def realize_scratch!(cfg, opts) @ws_cache.realize_for_random_init(cfg, opts.t_seq, opts.t_batch, opts.weight_dtype, opts.untied, opts.qkv_bias, opts.seed, opts.init_scale) nil end |
#realize_warm!(donor_gguf_path, cfg) ⇒ Object
OPTIONAL: warm the realize’d embed table from a donor GGUF. Owns the WHOLE donor read (toy#73 item 4 — was ~25 lines of bare GGUF plumbing in every consumer): open, re-read llama.embedding_length and DIM-CHECK it against cfg.donor_d_in (the width the lens was realized at — a mismatch would silently upload garbage through the wrong stride), find token_embd.weight, read the first cfg.vocab rows, upload through upload_donor!, free. Every failure raises NAMED + LOUD (which tensor, expected vs got, which path). Must be called AFTER realize_scratch! (the tensor exists) and BEFORE build! (else we train through the random init). INIT=scratch flows skip this method entirely. Returns nil.
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 138 def realize_warm!(donor_gguf_path, cfg) if !File.exist?(donor_gguf_path) raise "WarmStartCuda#realize_warm!: donor GGUF not found: " + donor_gguf_path end ggh = TinyNNCuda.tnn_gguf_load(donor_gguf_path) if ggh == nil || ggh == TinyNNCuda.tnn_null_ptr raise "WarmStartCuda#realize_warm!: failed to open " + donor_gguf_path + " (not a GGUF?)" end donor_d = TinyNNCuda.tnn_gguf_get_u32(ggh, "llama.embedding_length") if donor_d <= 0 TinyNNCuda.tnn_gguf_free(ggh) raise "WarmStartCuda#realize_warm!: donor has no " + "llama.embedding_length key — not llama-family? (" + donor_gguf_path + ")" end if donor_d != cfg.donor_d_in TinyNNCuda.tnn_gguf_free(ggh) raise "WarmStartCuda#realize_warm!: token_embd.weight width " + "mismatch: expected donor_d_in=" + cfg.donor_d_in.to_s + " (the width realize_scratch! sized the lens at) but " + "donor llama.embedding_length=" + donor_d.to_s + " (" + donor_gguf_path + ")" end te_idx = TinyNNCuda.tnn_gguf_find_index(ggh, "token_embd.weight") if te_idx < 0 TinyNNCuda.tnn_gguf_free(ggh) raise "WarmStartCuda#realize_warm!: donor has no " + "token_embd.weight tensor (" + donor_gguf_path + ")" end n_floats = cfg.vocab * donor_d te_buf = Mat.new(1, n_floats) rc = TinyNNCuda.tnn_gguf_read_f32_to_doubles(ggh, te_idx, te_buf.flat, n_floats) if rc != 0 TinyNNCuda.tnn_gguf_free(ggh) raise "WarmStartCuda#realize_warm!: token_embd.weight read failed " + "rc=" + rc.to_s + " — wanted " + n_floats.to_s + " floats (vocab " + cfg.vocab.to_s + " x donor_d " + donor_d.to_s + ") from " + donor_gguf_path end upload_donor!(te_buf.flat, n_floats) TinyNNCuda.tnn_gguf_free(ggh) nil end |
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object
ONE training step. Op order is COPIED VERBATIM from FromScratch#step! (from_scratch.rb:83-97) and LITERALLY IDENTICAL to LoRA#step!: graph_reset on the first step else reset_grads_only; the four uploads in order (token_ids/positions/labels/hp); compute_backward; download_row_major(t_loss, 1, 1). is_first selects the reset; the @ws_step_index accessor is carried for callers that want it but is NOT used for the reset decision, so the caller stays in full control of the step==0 branch. The per-step LR enters ONLY via the caller mutating m_hp.flat before this call — there is deliberately NO lr param here (matches the siblings; keeps schedule logic in the fixture). Returns the loss Float. Per-step input Mats are built by the caller.
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 223 def step!(seq_ids, positions, m_labels, m_hp, is_first) s = @ws_cache.sess if is_first TinyNNCuda.tnn_graph_reset(s) else TinyNNCuda.tnn_graph_reset_grads_only(s) end TinyNNCuda.upload_int_array(s, @ws_cache.t_seq_token_ids, seq_ids) TinyNNCuda.upload_int_array(s, @ws_cache.t_seq_positions, positions) TinyNNCuda.upload_row_major(s, @ws_t_labels, m_labels) TinyNNCuda.upload_row_major(s, @ws_t_hp, m_hp) TinyNNCuda.tnn_compute_backward(s) loss_mat = TinyNNCuda.download_row_major(s, @ws_t_loss, 1, 1) loss_mat.flat[0] end |
#upload_donor!(donor_buf_flat, n_floats) ⇒ Object
The raw upload MECHANISM realize_warm! rides (and the seam for already-read buffers — e.g. the legacy PCA-lens flow): one tnn_upload_from_float_array into the realize’d token_embed table (mirrors 09 L180). Same window rules as realize_warm!. The PCA lens W_proj upload (09 L188-229) stays caller-side through
191 192 193 194 195 196 |
# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 191 def upload_donor!(donor_buf_flat, n_floats) TinyNNCuda.tnn_upload_from_float_array(@ws_cache.sess, @ws_cache., donor_buf_flat, n_floats) nil end |