Class: Toy::SmolLM2
- Inherits:
-
Object
- Object
- Toy::SmolLM2
- Defined in:
- lib/toy/models/toy_smollm2.rb
Overview
SmolLM2 / generic llama-family decoder LM.
Supports both tied and untied output embeddings:
- SmolLM2 / Qwen2.5 / Gemma: tied (logits = x · token_embed.T)
- TinyLlama / Llama-2/3 / Mistral: untied (logits = x · lm_head.T)
Untied is opt-in via enable_untied_output! after construction. The output_proj weight is stored as [V, D] (matches token_embed layout) so the same matmul_t code path works for both.
Instance Attribute Summary collapse
-
#cfg ⇒ Object
Returns the value of attribute cfg.
-
#final_norm ⇒ Object
Returns the value of attribute final_norm.
-
#has_untied_output ⇒ Object
Returns the value of attribute has_untied_output.
-
#output_proj ⇒ Object
Returns the value of attribute output_proj.
-
#rope ⇒ Object
Returns the value of attribute rope.
-
#stack ⇒ Object
Returns the value of attribute stack.
-
#token_embed ⇒ Object
Returns the value of attribute token_embed.
Instance Method Summary collapse
-
#algorithm ⇒ Object
Phuong–Hutter style algorithm card.
- #algorithm_card ⇒ Object
-
#algorithm_card_full ⇒ Object
Recursive card: top-level forward + block + every sub-op (RMSNorm, GQAttention, RoPE, SwiGLU) inlined.
-
#enable_untied_output! ⇒ Object
Called by the GGUF loader when ‘output.weight` is present.
-
#forward(ids, pos_start) ⇒ Object
ids: Array<Int> (length T), pos_start: Int → logits [T, V].
-
#initialize(cfg) ⇒ SmolLM2
constructor
A new instance of SmolLM2.
-
#param_count ⇒ Object
Total trainable parameter count (tied embeddings counted once).
Constructor Details
#initialize(cfg) ⇒ SmolLM2
Returns a new instance of SmolLM2.
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/toy/models/toy_smollm2.rb', line 267 def initialize(cfg) @cfg = cfg @token_embed = Toy::Embedding.new(cfg.vocab, cfg.d_model) @final_norm = Toy::RMSNorm.new(cfg.d_model) @final_norm.eps = cfg.rms_eps @rope = Toy::RoPE.new(cfg.d_model / cfg.n_heads, cfg.ctx, cfg.rope_base) @stack = [Toy::SmolLM2Block.new(cfg, @rope)] li = 1 while li < cfg.n_layers @stack.push(Toy::SmolLM2Block.new(cfg, @rope)) li += 1 end # Always allocate the output projection at full [V, D] shape so # Spinel sees a stable Mat with known dimensions from the very # first reference. Costs vocab*d_model floats of memory even on # tied models (a few MB on SmolLM2, 256MB on TinyLlama) — small # next to the actual weights and avoids reassign-after-construct # surprises in the AOT type model. @output_proj = Mat.new(cfg.vocab, cfg.d_model) @has_untied_output = false end |
Instance Attribute Details
#cfg ⇒ Object
Returns the value of attribute cfg.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def cfg @cfg end |
#final_norm ⇒ Object
Returns the value of attribute final_norm.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def final_norm @final_norm end |
#has_untied_output ⇒ Object
Returns the value of attribute has_untied_output.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def has_untied_output @has_untied_output end |
#output_proj ⇒ Object
Returns the value of attribute output_proj.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def output_proj @output_proj end |
#rope ⇒ Object
Returns the value of attribute rope.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def rope @rope end |
#stack ⇒ Object
Returns the value of attribute stack.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def stack @stack end |
#token_embed ⇒ Object
Returns the value of attribute token_embed.
264 265 266 |
# File 'lib/toy/models/toy_smollm2.rb', line 264 def @token_embed end |
Instance Method Details
#algorithm ⇒ Object
Phuong–Hutter style algorithm card. Reads like the paper —tensor shapes annotated on the right, ← for assignment, ▷ for commentary. See arXiv:2207.09238 §4 for the canonical form.
‘algorithm` returns the structured form (Toy::Card); `algorithm_card` renders it to the human-readable Phuong–Hutter text. The structured form is what prep/card_to_code.rb consumes for round-trip parsing.
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
# File 'lib/toy/models/toy_smollm2.rb', line 337 def algorithm c = Toy::Card.new("Toy::SmolLM2.forward(x, p_start)", "Llama-family decoder") c.add_input("x", "{1..V}^T", "token IDs") c.add_input("p_start", "ℕ", "absolute position of x[0]; for RoPE") c.add_output("P", "R^{T×V}", "logits") c.add_hyper("V", @cfg.vocab.to_s) c.add_hyper("D", @cfg.d_model.to_s) c.add_hyper("H", @cfg.n_heads.to_s) c.add_hyper("H_kv", @cfg.n_kv.to_s) c.add_hyper("D_f", @cfg.d_ff.to_s) c.add_hyper("N", @cfg.n_layers.to_s) c.add_hyper("ctx", @cfg.ctx.to_s) c.add_hyper("θ_base", @cfg.rope_base.to_s) c.add_param("W_e", "R^{V×D}", "token embeddings") if @has_untied_output c.add_param("W_out", "R^{V×D}", "separate lm_head") end c.add_param("θ_block_ℓ", "(ℓ=1..N)", "per-block; see SmolLM2Block") c.add_param("γ_f", "R^D", "final RMSNorm") c.add_param_extra("(total " + Toy.fmt_count(param_count) + ")") c.step_bind("e", "W_e[x]", "e ∈ R^{T×D}") c.step_loop("ℓ ← 1, …, N", "") c.step_update("e", "e + GQAttn(RMSNorm(e; γ_ℓ^1, ε), p_start; θ_ℓ^attn)", "e ∈ R^{T×D}", "") c.step_update("e", "e + SwiGLU(RMSNorm(e; γ_ℓ^2, ε); θ_ℓ^ffn)", "e ∈ R^{T×D}", "") c.step_loop_close c.step_update("e", "RMSNorm(e; γ_f, ε)", "e ∈ R^{T×D}", "") if @has_untied_output c.step_bind("P", "e · W_out^⊤", "P ∈ R^{T×V} (untied)") else c.step_bind("P", "e · W_e^⊤", "P ∈ R^{T×V} (tied)") end c.step_return("P") c end |
#algorithm_card ⇒ Object
375 |
# File 'lib/toy/models/toy_smollm2.rb', line 375 def algorithm_card; algorithm.render_pseudocode; end |
#algorithm_card_full ⇒ Object
Recursive card: top-level forward + block + every sub-op (RMSNorm, GQAttention, RoPE, SwiGLU) inlined. Useful for the “full pseudocode” view; the top-level alone is the “section-1 overview” view.
381 382 383 384 385 386 387 388 389 390 391 |
# File 'lib/toy/models/toy_smollm2.rb', line 381 def algorithm_card_full blk = @stack[0] s = algorithm_card + "\n\n" s = s + "─── sub-algorithms ─────────────────────────────────────────────────────\n\n" s = s + blk.algorithm_card + "\n\n" s = s + blk.rn1.algorithm_card + "\n\n" s = s + blk.attn.algorithm_card + "\n\n" s = s + @rope.algorithm_card + "\n\n" s = s + blk.ffn.algorithm_card s end |
#enable_untied_output! ⇒ Object
Called by the GGUF loader when ‘output.weight` is present. The Mat is already allocated; this just flips the flag so the forward uses it.
295 296 297 |
# File 'lib/toy/models/toy_smollm2.rb', line 295 def enable_untied_output! @has_untied_output = true end |
#forward(ids, pos_start) ⇒ Object
ids: Array<Int> (length T), pos_start: Int → logits [T, V]
300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
# File 'lib/toy/models/toy_smollm2.rb', line 300 def forward(ids, pos_start) x = @token_embed.lookup(ids) # [T, D] li = 0 while li < @cfg.n_layers x = @stack[li].forward(x, pos_start) # [T, D] li += 1 end x_final = @final_norm.forward(x) # [T, D] if @has_untied_output x_final.matmul_t(@output_proj) # [T, V] (untied) else x_final.matmul_t(@token_embed.weight) # [T, V] (tied) end end |
#param_count ⇒ Object
Total trainable parameter count (tied embeddings counted once).
316 317 318 319 320 321 322 323 324 |
# File 'lib/toy/models/toy_smollm2.rb', line 316 def param_count total = @token_embed.param_count + @final_norm.param_count li = 0 while li < @cfg.n_layers total = total + @stack[li].param_count li += 1 end total end |