Class: Toy::SmolLM2Config
- Inherits:
-
Object
- Object
- Toy::SmolLM2Config
- Defined in:
- lib/toy/models/toy_smollm2.rb
Instance Attribute Summary collapse
-
#ctx ⇒ Object
Returns the value of attribute ctx.
-
#d_ff ⇒ Object
Returns the value of attribute d_ff.
-
#d_model ⇒ Object
Returns the value of attribute d_model.
-
#donor_d_in ⇒ Object
Returns the value of attribute donor_d_in.
-
#head_dim ⇒ Object
Returns the value of attribute head_dim.
-
#n_heads ⇒ Object
Returns the value of attribute n_heads.
-
#n_kv ⇒ Object
Returns the value of attribute n_kv.
-
#n_layers ⇒ Object
Returns the value of attribute n_layers.
-
#rms_eps ⇒ Object
Returns the value of attribute rms_eps.
-
#rope_base ⇒ Object
Returns the value of attribute rope_base.
-
#rope_scaling ⇒ Object
Returns the value of attribute rope_scaling.
-
#vocab ⇒ Object
Returns the value of attribute vocab.
Class Method Summary collapse
-
.gqa(vocab, d_model, heads, n_kv, d_ff, layers, ctx, rope_base, rms_eps) ⇒ Object
GQA: n_kv != n_heads (heads share K/V groups).
-
.mha(vocab, d_model, heads, d_ff, layers, ctx, rope_base, rms_eps) ⇒ Object
MHA: n_kv == n_heads (every head has its own K/V).
-
.mid ⇒ Object
A MID experiment shape — big enough to show real curves, small enough for a single GB10/laptop run: vocab 4096 (BPE-toy range), d=256, MHA 8 heads (d_head=32), d_ff=4*d_model=1024, 8 layers, ctx 256, rope_base=10000.0, rms_eps=1e-5.
-
.smollm2_135m ⇒ Object
Convenience: the default that matches SmolLM2-135M on HF.
-
.tiny ⇒ Object
The canonical SMOKE/GATE shape — the exact tiny config every from-scratch fixture trains (smoke_compute_surface / smoke_recipe_from_scratch / lib/toy/run/train.rb): TinyStories vocab 627, d=64, MHA 4 heads (d_head=16), d_ff=2*d_model=128, 2 layers, ctx 32, rope_base=10000.0 (FOUR zeros — the from-scratch convention), rms_eps=1e-5.
Instance Method Summary collapse
-
#initialize(vocab, d_model, n_heads, n_kv, d_ff, n_layers, ctx, rope_base, rms_eps) ⇒ SmolLM2Config
constructor
A new instance of SmolLM2Config.
Constructor Details
#initialize(vocab, d_model, n_heads, n_kv, d_ff, n_layers, ctx, rope_base, rms_eps) ⇒ SmolLM2Config
Returns a new instance of SmolLM2Config.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
# File 'lib/toy/models/toy_smollm2.rb', line 134 def initialize(vocab, d_model, n_heads, n_kv, d_ff, n_layers, ctx, rope_base, rms_eps) @vocab = vocab @d_model = d_model @n_heads = n_heads @n_kv = n_kv @d_ff = d_ff @n_layers = n_layers @ctx = ctx @rope_base = rope_base @rms_eps = rms_eps # Default head_dim: hidden_size / num_heads. Override via # cfg.head_dim = N when the GGUF carries an explicit key. @head_dim = n_heads > 0 ? d_model / n_heads : 0 # Default to no scaling. Callers set @rope_scaling after .new # (the GGUF loader does this in SmolLM2ConfigLoader.read). @rope_scaling = Toy::RopeScaling.none @donor_d_in = 0 # E2.3 — projection-lens disabled by default end |
Instance Attribute Details
#ctx ⇒ Object
Returns the value of attribute ctx.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def ctx @ctx end |
#d_ff ⇒ Object
Returns the value of attribute d_ff.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def d_ff @d_ff end |
#d_model ⇒ Object
Returns the value of attribute d_model.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def d_model @d_model end |
#donor_d_in ⇒ Object
Returns the value of attribute donor_d_in.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def donor_d_in @donor_d_in end |
#head_dim ⇒ Object
Returns the value of attribute head_dim.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def head_dim @head_dim end |
#n_heads ⇒ Object
Returns the value of attribute n_heads.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def n_heads @n_heads end |
#n_kv ⇒ Object
Returns the value of attribute n_kv.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def n_kv @n_kv end |
#n_layers ⇒ Object
Returns the value of attribute n_layers.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def n_layers @n_layers end |
#rms_eps ⇒ Object
Returns the value of attribute rms_eps.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def rms_eps @rms_eps end |
#rope_base ⇒ Object
Returns the value of attribute rope_base.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def rope_base @rope_base end |
#rope_scaling ⇒ Object
Returns the value of attribute rope_scaling.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def rope_scaling @rope_scaling end |
#vocab ⇒ Object
Returns the value of attribute vocab.
115 116 117 |
# File 'lib/toy/models/toy_smollm2.rb', line 115 def vocab @vocab end |
Class Method Details
.gqa(vocab, d_model, heads, n_kv, d_ff, layers, ctx, rope_base, rms_eps) ⇒ Object
GQA: n_kv != n_heads (heads share K/V groups).
207 208 209 210 |
# File 'lib/toy/models/toy_smollm2.rb', line 207 def self.gqa(vocab, d_model, heads, n_kv, d_ff, layers, ctx, rope_base, rms_eps) Toy::SmolLM2Config.new(vocab, d_model, heads, n_kv, d_ff, layers, ctx, rope_base, rms_eps) end |
.mha(vocab, d_model, heads, d_ff, layers, ctx, rope_base, rms_eps) ⇒ Object
MHA: n_kv == n_heads (every head has its own K/V).
201 202 203 204 |
# File 'lib/toy/models/toy_smollm2.rb', line 201 def self.mha(vocab, d_model, heads, d_ff, layers, ctx, rope_base, rms_eps) Toy::SmolLM2Config.new(vocab, d_model, heads, heads, d_ff, layers, ctx, rope_base, rms_eps) end |
.mid ⇒ Object
A MID experiment shape — big enough to show real curves, small enough for a single GB10/laptop run: vocab 4096 (BPE-toy range), d=256, MHA 8 heads (d_head=32), d_ff=4*d_model=1024, 8 layers, ctx 256, rope_base=10000.0, rms_eps=1e-5. ~12M params.
183 184 185 186 |
# File 'lib/toy/models/toy_smollm2.rb', line 183 def self.mid Toy::SmolLM2Config.new(4096, 256, 8, 8, 1024, 8, 256, 10000.0, 1.0e-5) end |
.smollm2_135m ⇒ Object
Convenience: the default that matches SmolLM2-135M on HF.
155 156 157 158 |
# File 'lib/toy/models/toy_smollm2.rb', line 155 def self.smollm2_135m Toy::SmolLM2Config.new(49152, 576, 9, 3, 1536, 30, 8192, 100000.0, 1.0e-5) end |
.tiny ⇒ Object
The canonical SMOKE/GATE shape — the exact tiny config every from-scratch fixture trains (smoke_compute_surface / smoke_recipe_from_scratch / lib/toy/run/train.rb): TinyStories vocab 627, d=64, MHA 4 heads (d_head=16), d_ff=2*d_model=128, 2 layers, ctx 32, rope_base=10000.0 (FOUR zeros — the from-scratch convention), rms_eps=1e-5. NOTE: the gate fixtures additionally set cfg.donor_d_in = 128 (projection lens) at the call site; tiny does not bake that (it is experiment-specific).
174 175 176 177 |
# File 'lib/toy/models/toy_smollm2.rb', line 174 def self.tiny Toy::SmolLM2Config.new(627, 64, 4, 4, 128, 2, 32, 10000.0, 1.0e-5) end |