Class: Toy::SmolLM2

Inherits:

Object

Object
Toy::SmolLM2

show all

Defined in:: lib/toy/models/toy_smollm2.rb

Overview

SmolLM2 / generic llama-family decoder LM.

Supports both tied and untied output embeddings:

- SmolLM2 / Qwen2.5 / Gemma: tied (logits = x · token_embed.T)
- TinyLlama / Llama-2/3 / Mistral: untied (logits = x · lm_head.T)

Untied is opt-in via enable_untied_output! after construction. The output_proj weight is stored as [V, D] (matches token_embed layout) so the same matmul_t code path works for both.

Instance Attribute Summary collapse

#cfg ⇒ Object

Returns the value of attribute cfg.
#final_norm ⇒ Object

Returns the value of attribute final_norm.
#has_untied_output ⇒ Object

Returns the value of attribute has_untied_output.
#output_proj ⇒ Object

Returns the value of attribute output_proj.
#rope ⇒ Object

Returns the value of attribute rope.
#stack ⇒ Object

Returns the value of attribute stack.
#token_embed ⇒ Object

Returns the value of attribute token_embed.

Instance Method Summary collapse

#algorithm ⇒ Object

Phuong–Hutter style algorithm card.
#algorithm_card ⇒ Object
#algorithm_card_full ⇒ Object

Recursive card: top-level forward + block + every sub-op (RMSNorm, GQAttention, RoPE, SwiGLU) inlined.
#enable_untied_output! ⇒ Object

Called by the GGUF loader when ‘output.weight` is present.
#forward(ids, pos_start) ⇒ Object

ids: Array<Int> (length T), pos_start: Int → logits [T, V].
#initialize(cfg) ⇒ SmolLM2 constructor

A new instance of SmolLM2.
#param_count ⇒ Object

Total trainable parameter count (tied embeddings counted once).

Constructor Details

#initialize(cfg) ⇒ `SmolLM2`

Returns a new instance of SmolLM2.

# File 'lib/toy/models/toy_smollm2.rb', line 267

def initialize(cfg)
  @cfg         = cfg
  @token_embed = Toy::Embedding.new(cfg.vocab, cfg.d_model)
  @final_norm  = Toy::RMSNorm.new(cfg.d_model)
  @final_norm.eps = cfg.rms_eps
  @rope        = Toy::RoPE.new(cfg.d_model / cfg.n_heads,
                               cfg.ctx, cfg.rope_base)

  @stack = [Toy::SmolLM2Block.new(cfg, @rope)]
  li = 1
  while li < cfg.n_layers
    @stack.push(Toy::SmolLM2Block.new(cfg, @rope))
    li += 1
  end

  # Always allocate the output projection at full [V, D] shape so
  # Spinel sees a stable Mat with known dimensions from the very
  # first reference. Costs vocab*d_model floats of memory even on
  # tied models (a few MB on SmolLM2, 256MB on TinyLlama) — small
  # next to the actual weights and avoids reassign-after-construct
  # surprises in the AOT type model.
  @output_proj       = Mat.new(cfg.vocab, cfg.d_model)
  @has_untied_output = false
end

Instance Attribute Details

#cfg ⇒ `Object`

Returns the value of attribute cfg.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def cfg
  @cfg
end

#final_norm ⇒ `Object`

Returns the value of attribute final_norm.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def final_norm
  @final_norm
end

#has_untied_output ⇒ `Object`

Returns the value of attribute has_untied_output.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def has_untied_output
  @has_untied_output
end

#output_proj ⇒ `Object`

Returns the value of attribute output_proj.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def output_proj
  @output_proj
end

#rope ⇒ `Object`

Returns the value of attribute rope.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def rope
  @rope
end

#stack ⇒ `Object`

Returns the value of attribute stack.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def stack
  @stack
end

#token_embed ⇒ `Object`

Returns the value of attribute token_embed.



264
265
266

# File 'lib/toy/models/toy_smollm2.rb', line 264

def token_embed
  @token_embed
end

Instance Method Details

#algorithm ⇒ `Object`

Phuong–Hutter style algorithm card. Reads like the paper —tensor shapes annotated on the right, ← for assignment, ▷ for commentary. See arXiv:2207.09238 §4 for the canonical form.

‘algorithm` returns the structured form (Toy::Card); `algorithm_card` renders it to the human-readable Phuong–Hutter text. The structured form is what prep/card_to_code.rb consumes for round-trip parsing.

# File 'lib/toy/models/toy_smollm2.rb', line 337

def algorithm
  c = Toy::Card.new("Toy::SmolLM2.forward(x, p_start)",
                    "Llama-family decoder")
  c.add_input("x",       "{1..V}^T", "token IDs")
  c.add_input("p_start", "ℕ",        "absolute position of x[0]; for RoPE")
  c.add_output("P",      "R^{T×V}",  "logits")
  c.add_hyper("V",      @cfg.vocab.to_s)
  c.add_hyper("D",      @cfg.d_model.to_s)
  c.add_hyper("H",      @cfg.n_heads.to_s)
  c.add_hyper("H_kv",   @cfg.n_kv.to_s)
  c.add_hyper("D_f",    @cfg.d_ff.to_s)
  c.add_hyper("N",      @cfg.n_layers.to_s)
  c.add_hyper("ctx",    @cfg.ctx.to_s)
  c.add_hyper("θ_base", @cfg.rope_base.to_s)
  c.add_param("W_e", "R^{V×D}", "token embeddings")
  if @has_untied_output
    c.add_param("W_out", "R^{V×D}", "separate lm_head")
  end
  c.add_param("θ_block_ℓ", "(ℓ=1..N)", "per-block; see SmolLM2Block")
  c.add_param("γ_f",       "R^D",      "final RMSNorm")
  c.add_param_extra("(total " + Toy.fmt_count(param_count) + ")")
  c.step_bind("e", "W_e[x]", "e ∈ R^{T×D}")
  c.step_loop("ℓ ← 1, …, N", "")
  c.step_update("e", "e + GQAttn(RMSNorm(e; γ_ℓ^1, ε), p_start; θ_ℓ^attn)",
                "e ∈ R^{T×D}", "")
  c.step_update("e", "e + SwiGLU(RMSNorm(e; γ_ℓ^2, ε); θ_ℓ^ffn)",
                "e ∈ R^{T×D}", "")
  c.step_loop_close
  c.step_update("e", "RMSNorm(e; γ_f, ε)", "e ∈ R^{T×D}", "")
  if @has_untied_output
    c.step_bind("P", "e · W_out^⊤", "P ∈ R^{T×V}  (untied)")
  else
    c.step_bind("P", "e · W_e^⊤",   "P ∈ R^{T×V}  (tied)")
  end
  c.step_return("P")
  c
end

#algorithm_card ⇒ `Object`

375	# File 'lib/toy/models/toy_smollm2.rb', line 375 def algorithm_card; algorithm.render_pseudocode; end

#algorithm_card_full ⇒ `Object`

Recursive card: top-level forward + block + every sub-op (RMSNorm, GQAttention, RoPE, SwiGLU) inlined. Useful for the “full pseudocode” view; the top-level alone is the “section-1 overview” view.

# File 'lib/toy/models/toy_smollm2.rb', line 381

def algorithm_card_full
  blk = @stack[0]
  s = algorithm_card + "\n\n"
  s = s + "─── sub-algorithms ─────────────────────────────────────────────────────\n\n"
  s = s + blk.algorithm_card    + "\n\n"
  s = s + blk.rn1.algorithm_card  + "\n\n"
  s = s + blk.attn.algorithm_card + "\n\n"
  s = s + @rope.algorithm_card    + "\n\n"
  s = s + blk.ffn.algorithm_card
  s
end

#enable_untied_output! ⇒ `Object`

Called by the GGUF loader when ‘output.weight` is present. The Mat is already allocated; this just flips the flag so the forward uses it.



295
296
297

# File 'lib/toy/models/toy_smollm2.rb', line 295

def enable_untied_output!
  @has_untied_output = true
end

#forward(ids, pos_start) ⇒ `Object`

ids: Array<Int> (length T), pos_start: Int → logits [T, V]

# File 'lib/toy/models/toy_smollm2.rb', line 300

def forward(ids, pos_start)
  x = @token_embed.lookup(ids)                           # [T, D]
  li = 0
  while li < @cfg.n_layers
    x = @stack[li].forward(x, pos_start)                 # [T, D]
    li += 1
  end
  x_final = @final_norm.forward(x)                       # [T, D]
  if @has_untied_output
    x_final.matmul_t(@output_proj)                       # [T, V]  (untied)
  else
    x_final.matmul_t(@token_embed.weight)                # [T, V]  (tied)
  end
end

#param_count ⇒ `Object`

Total trainable parameter count (tied embeddings counted once).

# File 'lib/toy/models/toy_smollm2.rb', line 316

def param_count
  total = @token_embed.param_count + @final_norm.param_count
  li = 0
  while li < @cfg.n_layers
    total = total + @stack[li].param_count
    li += 1
  end
  total
end

Class: Toy::SmolLM2

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(cfg) ⇒ SmolLM2

Instance Attribute Details

#cfg ⇒ Object

#final_norm ⇒ Object

#has_untied_output ⇒ Object

#output_proj ⇒ Object

#rope ⇒ Object

#stack ⇒ Object

#token_embed ⇒ Object