Class: Toy::LLM::Recipes::FromScratch
- Inherits:
-
Object
- Object
- Toy::LLM::Recipes::FromScratch
- Defined in:
- lib/toy/llm/recipes/from_scratch.rb
Overview
The from-scratch random-init training recipe. Encapsulates the existing loop: realize! builds the random-init forward+CE+backward+ AdamW graph on a Toy::LLM::Engine::LlamaSeqEngine (random-init realize self-enables full_finetune + train_embeddings, so no extra enable_* call is needed), then step! drives one training step. The caller (fixture) owns the experiment config and the per-step input Mats.
Instance Attribute Summary collapse
-
#fs_cache ⇒ Object
Returns the value of attribute fs_cache.
-
#fs_step_index ⇒ Object
Returns the value of attribute fs_step_index.
-
#fs_t_hp ⇒ Object
Returns the value of attribute fs_t_hp.
-
#fs_t_labels ⇒ Object
Returns the value of attribute fs_t_labels.
-
#fs_t_loss ⇒ Object
Returns the value of attribute fs_t_loss.
Instance Method Summary collapse
-
#initialize ⇒ FromScratch
constructor
A new instance of FromScratch.
-
#realize!(cfg, opts) ⇒ Object
Realize the random-init graph.
-
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object
ONE training step.
Constructor Details
#initialize ⇒ FromScratch
Returns a new instance of FromScratch.
51 52 53 54 55 56 57 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 51 def initialize @fs_cache = Toy::LLM::Engine::LlamaSeqEngine.new @fs_t_loss = nil @fs_t_labels = nil @fs_t_hp = nil @fs_step_index = 0 end |
Instance Attribute Details
#fs_cache ⇒ Object
Returns the value of attribute fs_cache.
49 50 51 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 49 def fs_cache @fs_cache end |
#fs_step_index ⇒ Object
Returns the value of attribute fs_step_index.
49 50 51 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 49 def fs_step_index @fs_step_index end |
#fs_t_hp ⇒ Object
Returns the value of attribute fs_t_hp.
49 50 51 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 49 def fs_t_hp @fs_t_hp end |
#fs_t_labels ⇒ Object
Returns the value of attribute fs_t_labels.
49 50 51 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 49 def fs_t_labels @fs_t_labels end |
#fs_t_loss ⇒ Object
Returns the value of attribute fs_t_loss.
49 50 51 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 49 def fs_t_loss @fs_t_loss end |
Instance Method Details
#realize!(cfg, opts) ⇒ Object
Realize the random-init graph. Delegates VERBATIM to the cache: realize_for_random_init (which self-enables @ft_train_embeddings_enabled + @seq_full_finetune_enabled) then build_training_step (forward + CE + backward + opt_step_adamw baked into the ggml graph). Stashes the returned [t_loss, t_labels, t_hp] triple. ‘opts` is a Toy::LLM::RecipeOptions (toy#64 item 1) carrying the former 7 trailing positional args (t_seq, t_batch, weight_dtype, untied, qkv_bias, seed, init_scale) — unpacked here in the engine’s exact positional order, so the realize is byte-identical. Returns nil.
68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 68 def realize!(cfg, opts) @fs_cache.realize_for_random_init(cfg, opts.t_seq, opts.t_batch, opts.weight_dtype, opts.untied, opts.qkv_bias, opts.seed, opts.init_scale) result = @fs_cache.build_training_step @fs_t_loss = result[0] @fs_t_labels = result[1] @fs_t_hp = result[2] nil end |
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object
ONE training step. Op order is VERBATIM from smoke_projection_lens.rb:97-112: graph_reset on the first step else reset_grads_only; the four uploads in order (token_ids/positions/labels/hp); compute_backward; download_row_major(t_loss, 1, 1). is_first selects the reset; the NOT used for the reset decision, so the caller stays in full control of the step==0 branch (matches the gate’s step==0 branch exactly). Returns the loss Float. Per-step input Mats are built by the caller.
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/toy/llm/recipes/from_scratch.rb', line 89 def step!(seq_ids, positions, m_labels, m_hp, is_first) s = @fs_cache.sess if is_first TinyNN.tnn_graph_reset(s) else TinyNN.tnn_graph_reset_grads_only(s) end TinyNN.upload_int_array(s, @fs_cache.t_seq_token_ids, seq_ids) TinyNN.upload_int_array(s, @fs_cache.t_seq_positions, positions) TinyNN.upload_row_major(s, @fs_t_labels, m_labels) TinyNN.upload_row_major(s, @fs_t_hp, m_hp) TinyNN.tnn_compute_backward(s) loss_mat = TinyNN.download_row_major(s, @fs_t_loss, 1, 1) loss_mat.flat[0] end |