Class: Toy::SmolLM2Config

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/models/toy_smollm2.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(vocab, d_model, n_heads, n_kv, d_ff, n_layers, ctx, rope_base, rms_eps) ⇒ SmolLM2Config

Returns a new instance of SmolLM2Config.



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/toy/models/toy_smollm2.rb', line 134

def initialize(vocab, d_model, n_heads, n_kv, d_ff, n_layers,
               ctx, rope_base, rms_eps)
  @vocab        = vocab
  @d_model      = d_model
  @n_heads      = n_heads
  @n_kv         = n_kv
  @d_ff         = d_ff
  @n_layers     = n_layers
  @ctx          = ctx
  @rope_base    = rope_base
  @rms_eps      = rms_eps
  # Default head_dim: hidden_size / num_heads. Override via
  # cfg.head_dim = N when the GGUF carries an explicit key.
  @head_dim     = n_heads > 0 ? d_model / n_heads : 0
  # Default to no scaling. Callers set @rope_scaling after .new
  # (the GGUF loader does this in SmolLM2ConfigLoader.read).
  @rope_scaling = Toy::RopeScaling.none
  @donor_d_in   = 0   # E2.3 — projection-lens disabled by default
end

Instance Attribute Details

#ctxObject

Returns the value of attribute ctx.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def ctx
  @ctx
end

#d_ffObject

Returns the value of attribute d_ff.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def d_ff
  @d_ff
end

#d_modelObject

Returns the value of attribute d_model.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def d_model
  @d_model
end

#donor_d_inObject

Returns the value of attribute donor_d_in.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def donor_d_in
  @donor_d_in
end

#head_dimObject

Returns the value of attribute head_dim.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def head_dim
  @head_dim
end

#n_headsObject

Returns the value of attribute n_heads.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def n_heads
  @n_heads
end

#n_kvObject

Returns the value of attribute n_kv.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def n_kv
  @n_kv
end

#n_layersObject

Returns the value of attribute n_layers.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def n_layers
  @n_layers
end

#rms_epsObject

Returns the value of attribute rms_eps.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def rms_eps
  @rms_eps
end

#rope_baseObject

Returns the value of attribute rope_base.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def rope_base
  @rope_base
end

#rope_scalingObject

Returns the value of attribute rope_scaling.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def rope_scaling
  @rope_scaling
end

#vocabObject

Returns the value of attribute vocab.



115
116
117
# File 'lib/toy/models/toy_smollm2.rb', line 115

def vocab
  @vocab
end

Class Method Details

.gqa(vocab, d_model, heads, n_kv, d_ff, layers, ctx, rope_base, rms_eps) ⇒ Object

GQA: n_kv != n_heads (heads share K/V groups).



207
208
209
210
# File 'lib/toy/models/toy_smollm2.rb', line 207

def self.gqa(vocab, d_model, heads, n_kv, d_ff, layers, ctx, rope_base, rms_eps)
  Toy::SmolLM2Config.new(vocab, d_model, heads, n_kv, d_ff, layers,
                         ctx, rope_base, rms_eps)
end

.mha(vocab, d_model, heads, d_ff, layers, ctx, rope_base, rms_eps) ⇒ Object

MHA: n_kv == n_heads (every head has its own K/V).



201
202
203
204
# File 'lib/toy/models/toy_smollm2.rb', line 201

def self.mha(vocab, d_model, heads, d_ff, layers, ctx, rope_base, rms_eps)
  Toy::SmolLM2Config.new(vocab, d_model, heads, heads, d_ff, layers,
                         ctx, rope_base, rms_eps)
end

.midObject

A MID experiment shape — big enough to show real curves, small enough for a single GB10/laptop run: vocab 4096 (BPE-toy range), d=256, MHA 8 heads (d_head=32), d_ff=4*d_model=1024, 8 layers, ctx 256, rope_base=10000.0, rms_eps=1e-5. ~12M params.



183
184
185
186
# File 'lib/toy/models/toy_smollm2.rb', line 183

def self.mid
  Toy::SmolLM2Config.new(4096, 256, 8, 8, 1024, 8,
                         256, 10000.0, 1.0e-5)
end

.smollm2_135mObject

Convenience: the default that matches SmolLM2-135M on HF.



155
156
157
158
# File 'lib/toy/models/toy_smollm2.rb', line 155

def self.smollm2_135m
  Toy::SmolLM2Config.new(49152, 576, 9, 3, 1536, 30,
                         8192, 100000.0, 1.0e-5)
end

.tinyObject

The canonical SMOKE/GATE shape — the exact tiny config every from-scratch fixture trains (smoke_compute_surface / smoke_recipe_from_scratch / lib/toy/run/train.rb): TinyStories vocab 627, d=64, MHA 4 heads (d_head=16), d_ff=2*d_model=128, 2 layers, ctx 32, rope_base=10000.0 (FOUR zeros — the from-scratch convention), rms_eps=1e-5. NOTE: the gate fixtures additionally set cfg.donor_d_in = 128 (projection lens) at the call site; tiny does not bake that (it is experiment-specific).



174
175
176
177
# File 'lib/toy/models/toy_smollm2.rb', line 174

def self.tiny
  Toy::SmolLM2Config.new(627, 64, 4, 4, 128, 2,
                         32, 10000.0, 1.0e-5)
end