Class: Toy::SmolLM2

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/models/toy_smollm2.rb

Overview

SmolLM2 / generic llama-family decoder LM.

Supports both tied and untied output embeddings:

- SmolLM2 / Qwen2.5 / Gemma: tied (logits = x · token_embed.T)
- TinyLlama / Llama-2/3 / Mistral: untied (logits = x · lm_head.T)

Untied is opt-in via enable_untied_output! after construction. The output_proj weight is stored as [V, D] (matches token_embed layout) so the same matmul_t code path works for both.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(cfg) ⇒ SmolLM2

Returns a new instance of SmolLM2.



267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
# File 'lib/toy/models/toy_smollm2.rb', line 267

def initialize(cfg)
  @cfg         = cfg
  @token_embed = Toy::Embedding.new(cfg.vocab, cfg.d_model)
  @final_norm  = Toy::RMSNorm.new(cfg.d_model)
  @final_norm.eps = cfg.rms_eps
  @rope        = Toy::RoPE.new(cfg.d_model / cfg.n_heads,
                               cfg.ctx, cfg.rope_base)

  @stack = [Toy::SmolLM2Block.new(cfg, @rope)]
  li = 1
  while li < cfg.n_layers
    @stack.push(Toy::SmolLM2Block.new(cfg, @rope))
    li += 1
  end

  # Always allocate the output projection at full [V, D] shape so
  # Spinel sees a stable Mat with known dimensions from the very
  # first reference. Costs vocab*d_model floats of memory even on
  # tied models (a few MB on SmolLM2, 256MB on TinyLlama) — small
  # next to the actual weights and avoids reassign-after-construct
  # surprises in the AOT type model.
  @output_proj       = Mat.new(cfg.vocab, cfg.d_model)
  @has_untied_output = false
end

Instance Attribute Details

#cfgObject

Returns the value of attribute cfg.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def cfg
  @cfg
end

#final_normObject

Returns the value of attribute final_norm.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def final_norm
  @final_norm
end

#has_untied_outputObject

Returns the value of attribute has_untied_output.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def has_untied_output
  @has_untied_output
end

#output_projObject

Returns the value of attribute output_proj.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def output_proj
  @output_proj
end

#ropeObject

Returns the value of attribute rope.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def rope
  @rope
end

#stackObject

Returns the value of attribute stack.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def stack
  @stack
end

#token_embedObject

Returns the value of attribute token_embed.



264
265
266
# File 'lib/toy/models/toy_smollm2.rb', line 264

def token_embed
  @token_embed
end

Instance Method Details

#algorithmObject

Phuong–Hutter style algorithm card. Reads like the paper —tensor shapes annotated on the right, ← for assignment, ▷ for commentary. See arXiv:2207.09238 §4 for the canonical form.

‘algorithm` returns the structured form (Toy::Card); `algorithm_card` renders it to the human-readable Phuong–Hutter text. The structured form is what prep/card_to_code.rb consumes for round-trip parsing.



337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
# File 'lib/toy/models/toy_smollm2.rb', line 337

def algorithm
  c = Toy::Card.new("Toy::SmolLM2.forward(x, p_start)",
                    "Llama-family decoder")
  c.add_input("x",       "{1..V}^T", "token IDs")
  c.add_input("p_start", "",        "absolute position of x[0]; for RoPE")
  c.add_output("P",      "R^{T×V}",  "logits")
  c.add_hyper("V",      @cfg.vocab.to_s)
  c.add_hyper("D",      @cfg.d_model.to_s)
  c.add_hyper("H",      @cfg.n_heads.to_s)
  c.add_hyper("H_kv",   @cfg.n_kv.to_s)
  c.add_hyper("D_f",    @cfg.d_ff.to_s)
  c.add_hyper("N",      @cfg.n_layers.to_s)
  c.add_hyper("ctx",    @cfg.ctx.to_s)
  c.add_hyper("θ_base", @cfg.rope_base.to_s)
  c.add_param("W_e", "R^{V×D}", "token embeddings")
  if @has_untied_output
    c.add_param("W_out", "R^{V×D}", "separate lm_head")
  end
  c.add_param("θ_block_ℓ", "(ℓ=1..N)", "per-block; see SmolLM2Block")
  c.add_param("γ_f",       "R^D",      "final RMSNorm")
  c.add_param_extra("(total " + Toy.fmt_count(param_count) + ")")
  c.step_bind("e", "W_e[x]", "e ∈ R^{T×D}")
  c.step_loop("ℓ ← 1, …, N", "")
  c.step_update("e", "e + GQAttn(RMSNorm(e; γ_ℓ^1, ε), p_start; θ_ℓ^attn)",
                "e ∈ R^{T×D}", "")
  c.step_update("e", "e + SwiGLU(RMSNorm(e; γ_ℓ^2, ε); θ_ℓ^ffn)",
                "e ∈ R^{T×D}", "")
  c.step_loop_close
  c.step_update("e", "RMSNorm(e; γ_f, ε)", "e ∈ R^{T×D}", "")
  if @has_untied_output
    c.step_bind("P", "e · W_out^⊤", "P ∈ R^{T×V}  (untied)")
  else
    c.step_bind("P", "e · W_e^⊤",   "P ∈ R^{T×V}  (tied)")
  end
  c.step_return("P")
  c
end

#algorithm_cardObject



375
# File 'lib/toy/models/toy_smollm2.rb', line 375

def algorithm_card; algorithm.render_pseudocode; end

#algorithm_card_fullObject

Recursive card: top-level forward + block + every sub-op (RMSNorm, GQAttention, RoPE, SwiGLU) inlined. Useful for the “full pseudocode” view; the top-level alone is the “section-1 overview” view.



381
382
383
384
385
386
387
388
389
390
391
# File 'lib/toy/models/toy_smollm2.rb', line 381

def algorithm_card_full
  blk = @stack[0]
  s = algorithm_card + "\n\n"
  s = s + "─── sub-algorithms ─────────────────────────────────────────────────────\n\n"
  s = s + blk.algorithm_card    + "\n\n"
  s = s + blk.rn1.algorithm_card  + "\n\n"
  s = s + blk.attn.algorithm_card + "\n\n"
  s = s + @rope.algorithm_card    + "\n\n"
  s = s + blk.ffn.algorithm_card
  s
end

#enable_untied_output!Object

Called by the GGUF loader when ‘output.weight` is present. The Mat is already allocated; this just flips the flag so the forward uses it.



295
296
297
# File 'lib/toy/models/toy_smollm2.rb', line 295

def enable_untied_output!
  @has_untied_output = true
end

#forward(ids, pos_start) ⇒ Object

ids: Array<Int> (length T), pos_start: Int → logits [T, V]



300
301
302
303
304
305
306
307
308
309
310
311
312
313
# File 'lib/toy/models/toy_smollm2.rb', line 300

def forward(ids, pos_start)
  x = @token_embed.lookup(ids)                           # [T, D]
  li = 0
  while li < @cfg.n_layers
    x = @stack[li].forward(x, pos_start)                 # [T, D]
    li += 1
  end
  x_final = @final_norm.forward(x)                       # [T, D]
  if @has_untied_output
    x_final.matmul_t(@output_proj)                       # [T, V]  (untied)
  else
    x_final.matmul_t(@token_embed.weight)                # [T, V]  (tied)
  end
end

#param_countObject

Total trainable parameter count (tied embeddings counted once).



316
317
318
319
320
321
322
323
324
# File 'lib/toy/models/toy_smollm2.rb', line 316

def param_count
  total = @token_embed.param_count + @final_norm.param_count
  li = 0
  while li < @cfg.n_layers
    total = total + @stack[li].param_count
    li += 1
  end
  total
end