Class: Toy::LLM::Recipes::WarmStart
- Inherits:
-
Object
- Object
- Toy::LLM::Recipes::WarmStart
- Defined in:
- lib/toy/llm/recipes/warm_start.rb
Overview
The warm-start training recipe. realize_scratch! builds the random-init forward+CE+backward+AdamW graph on a Toy::LLM::Engine::LlamaSeqEngine (random-init realize self-enables full_finetune + train_embeddings, so no extra enable_* call is needed) and OPENS the warm window; realize_warm! (optional) uploads an already-read donor embedding into the realize’d embed table BEFORE the graph is baked; build! CLOSES the window by baking forward+CE+backward+opt_step_adamw into the ggml graph. step! then drives one training step. The caller (fixture) owns the experiment config, the donor/PCA GGUF read, the corpus stream, the LR schedule, and the per-step input Mats.
Instance Attribute Summary collapse
-
#ws_cache ⇒ Object
Returns the value of attribute ws_cache.
-
#ws_step_index ⇒ Object
Returns the value of attribute ws_step_index.
-
#ws_t_hp ⇒ Object
Returns the value of attribute ws_t_hp.
-
#ws_t_labels ⇒ Object
Returns the value of attribute ws_t_labels.
-
#ws_t_loss ⇒ Object
Returns the value of attribute ws_t_loss.
Class Method Summary collapse
-
.donor_embed_width(donor_gguf_path) ⇒ Object
Read the donor’s embedding width (llama.embedding_length) from a GGUF path — the value the caller must put in cfg.donor_d_in BEFORE realize_scratch! (the projection lens is sized donor_d_in x d_model at realize time, so the recipe cannot learn it later).
Instance Method Summary collapse
-
#build! ⇒ Object
CLOSE the warm window: bake forward + CE + backward + opt_step_adamw into the ggml graph (no Ruby Trainer/optimizer — same rationale as FromScratch).
-
#initialize ⇒ WarmStart
constructor
A new instance of WarmStart.
-
#realize_scratch!(cfg, opts) ⇒ Object
Realize the random-init graph and OPEN the warm window.
-
#realize_warm!(donor_gguf_path, cfg) ⇒ Object
OPTIONAL: warm the realize’d embed table from a donor GGUF.
-
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object
ONE training step.
-
#upload_donor!(donor_buf_flat, n_floats) ⇒ Object
The raw upload MECHANISM realize_warm! rides (and the seam for already-read buffers — e.g. the legacy PCA-lens flow): one tnn_upload_from_float_array into the realize’d token_embed table (mirrors 09 L180).
Constructor Details
Instance Attribute Details
#ws_cache ⇒ Object
Returns the value of attribute ws_cache.
69 70 71 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 69 def ws_cache @ws_cache end |
#ws_step_index ⇒ Object
Returns the value of attribute ws_step_index.
69 70 71 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 69 def ws_step_index @ws_step_index end |
#ws_t_hp ⇒ Object
Returns the value of attribute ws_t_hp.
69 70 71 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 69 def ws_t_hp @ws_t_hp end |
#ws_t_labels ⇒ Object
Returns the value of attribute ws_t_labels.
69 70 71 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 69 def ws_t_labels @ws_t_labels end |
#ws_t_loss ⇒ Object
Returns the value of attribute ws_t_loss.
69 70 71 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 69 def ws_t_loss @ws_t_loss end |
Class Method Details
.donor_embed_width(donor_gguf_path) ⇒ Object
Read the donor’s embedding width (llama.embedding_length) from a GGUF path — the value the caller must put in cfg.donor_d_in BEFORE realize_scratch! (the projection lens is sized donor_d_in x d_model at realize time, so the recipe cannot learn it later). FAILS LOUD on a missing/corrupt donor or a non-llama-family GGUF. (toy#73 item 4 — the read half of the donor plumbing realize_warm! owns.)
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 103 def self.(donor_gguf_path) if !File.exist?(donor_gguf_path) raise "WarmStart.donor_embed_width: donor GGUF not found: " + donor_gguf_path end ggh = TinyNN.tnn_gguf_load(donor_gguf_path) if ggh == nil || ggh == TinyNN.tnn_null_ptr raise "WarmStart.donor_embed_width: failed to open " + donor_gguf_path + " (not a GGUF?)" end donor_d = TinyNN.tnn_gguf_get_u32(ggh, "llama.embedding_length") TinyNN.tnn_gguf_free(ggh) if donor_d <= 0 raise "WarmStart.donor_embed_width: donor has no " + "llama.embedding_length key — not llama-family? (" + donor_gguf_path + ")" end donor_d end |
Instance Method Details
#build! ⇒ Object
CLOSE the warm window: bake forward + CE + backward + opt_step_adamw into the ggml graph (no Ruby Trainer/optimizer —same rationale as FromScratch). Delegates VERBATIM to build_training_step and stashes the returned [t_loss, t_labels, t_hp] triple. Returns nil.
199 200 201 202 203 204 205 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 199 def build! result = @ws_cache.build_training_step @ws_t_loss = result[0] @ws_t_labels = result[1] @ws_t_hp = result[2] nil end |
#realize_scratch!(cfg, opts) ⇒ Object
Realize the random-init graph and OPEN the warm window. Delegates VERBATIM to the cache: realize_for_random_init (which self-enables ‘opts` is a Toy::LLM::RecipeOptions (toy#64 item 1) carrying the former 7 trailing positional args, unpacked here in the engine’s exact positional order (identical to FromScratch#realize! / 09 L138), so the realize is byte-identical. Does NOT bake the graph —that is build!‘s job, leaving the window open for an optional realize_warm! upload in between. Returns nil.
88 89 90 91 92 93 94 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 88 def realize_scratch!(cfg, opts) @ws_cache.realize_for_random_init(cfg, opts.t_seq, opts.t_batch, opts.weight_dtype, opts.untied, opts.qkv_bias, opts.seed, opts.init_scale) nil end |
#realize_warm!(donor_gguf_path, cfg) ⇒ Object
OPTIONAL: warm the realize’d embed table from a donor GGUF. Owns the WHOLE donor read (toy#73 item 4 — was ~25 lines of bare GGUF plumbing in every consumer): open, re-read llama.embedding_length and DIM-CHECK it against cfg.donor_d_in (the width the lens was realized at — a mismatch would silently upload garbage through the wrong stride), find token_embd.weight, read the first cfg.vocab rows, upload through upload_donor!, free. Every failure raises NAMED + LOUD (which tensor, expected vs got, which path). Must be called AFTER realize_scratch! (the tensor exists) and BEFORE build! (else we train through the random init). INIT=scratch flows skip this method entirely. Returns nil.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 134 def realize_warm!(donor_gguf_path, cfg) if !File.exist?(donor_gguf_path) raise "WarmStart#realize_warm!: donor GGUF not found: " + donor_gguf_path end ggh = TinyNN.tnn_gguf_load(donor_gguf_path) if ggh == nil || ggh == TinyNN.tnn_null_ptr raise "WarmStart#realize_warm!: failed to open " + donor_gguf_path + " (not a GGUF?)" end donor_d = TinyNN.tnn_gguf_get_u32(ggh, "llama.embedding_length") if donor_d <= 0 TinyNN.tnn_gguf_free(ggh) raise "WarmStart#realize_warm!: donor has no " + "llama.embedding_length key — not llama-family? (" + donor_gguf_path + ")" end if donor_d != cfg.donor_d_in TinyNN.tnn_gguf_free(ggh) raise "WarmStart#realize_warm!: token_embd.weight width " + "mismatch: expected donor_d_in=" + cfg.donor_d_in.to_s + " (the width realize_scratch! sized the lens at) but " + "donor llama.embedding_length=" + donor_d.to_s + " (" + donor_gguf_path + ")" end te_idx = TinyNN.tnn_gguf_find_index(ggh, "token_embd.weight") if te_idx < 0 TinyNN.tnn_gguf_free(ggh) raise "WarmStart#realize_warm!: donor has no " + "token_embd.weight tensor (" + donor_gguf_path + ")" end n_floats = cfg.vocab * donor_d te_buf = Mat.new(1, n_floats) rc = TinyNN.tnn_gguf_read_f32_to_doubles(ggh, te_idx, te_buf.flat, n_floats) if rc != 0 TinyNN.tnn_gguf_free(ggh) raise "WarmStart#realize_warm!: token_embd.weight read failed " + "rc=" + rc.to_s + " — wanted " + n_floats.to_s + " floats (vocab " + cfg.vocab.to_s + " x donor_d " + donor_d.to_s + ") from " + donor_gguf_path end upload_donor!(te_buf.flat, n_floats) TinyNN.tnn_gguf_free(ggh) nil end |
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object
ONE training step. Op order is COPIED VERBATIM from FromScratch#step! (from_scratch.rb:83-97) and LITERALLY IDENTICAL to LoRA#step!: graph_reset on the first step else reset_grads_only; the four uploads in order (token_ids/positions/labels/hp); compute_backward; download_row_major(t_loss, 1, 1). is_first selects the reset; the @ws_step_index accessor is carried for callers that want it but is NOT used for the reset decision, so the caller stays in full control of the step==0 branch. The per-step LR enters ONLY via the caller mutating m_hp.flat before this call — there is deliberately NO lr param here (matches the siblings; keeps schedule logic in the fixture). Returns the loss Float. Per-step input Mats are built by the caller.
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 219 def step!(seq_ids, positions, m_labels, m_hp, is_first) s = @ws_cache.sess if is_first TinyNN.tnn_graph_reset(s) else TinyNN.tnn_graph_reset_grads_only(s) end TinyNN.upload_int_array(s, @ws_cache.t_seq_token_ids, seq_ids) TinyNN.upload_int_array(s, @ws_cache.t_seq_positions, positions) TinyNN.upload_row_major(s, @ws_t_labels, m_labels) TinyNN.upload_row_major(s, @ws_t_hp, m_hp) TinyNN.tnn_compute_backward(s) loss_mat = TinyNN.download_row_major(s, @ws_t_loss, 1, 1) loss_mat.flat[0] end |
#upload_donor!(donor_buf_flat, n_floats) ⇒ Object
The raw upload MECHANISM realize_warm! rides (and the seam for already-read buffers — e.g. the legacy PCA-lens flow): one tnn_upload_from_float_array into the realize’d token_embed table (mirrors 09 L180). Same window rules as realize_warm!. The PCA lens W_proj upload (09 L188-229) stays caller-side through
187 188 189 190 191 192 |
# File 'lib/toy/llm/recipes/warm_start.rb', line 187 def upload_donor!(donor_buf_flat, n_floats) TinyNN.tnn_upload_from_float_array(@ws_cache.sess, @ws_cache., donor_buf_flat, n_floats) nil end |