Class: AdamState
- Inherits:
-
Object
- Object
- AdamState
- Defined in:
- lib/toy/models/transformer.rb
Overview
Adam optimizer state: per-parameter first and second moments (m, v), plus the accumulating bias-correction products bc1=β1ᵗ, bc2=β2ᵗ. We maintain bc1/bc2 as running products (one multiply per step) instead of computing β**t each step (one pow() call) — pure perf choice; Spinel handles ‘Float ** Int` cleanly.
m and v are Gradients-shaped (same per-parameter structure as the accumulator) so we can reuse Gradients#fill_zero to initialize them.
Instance Attribute Summary collapse
-
#bc1 ⇒ Object
Returns the value of attribute bc1.
-
#bc2 ⇒ Object
Returns the value of attribute bc2.
-
#m ⇒ Object
Returns the value of attribute m.
-
#v ⇒ Object
Returns the value of attribute v.
Instance Method Summary collapse
-
#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ AdamState
constructor
A new instance of AdamState.
Constructor Details
#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ AdamState
Returns a new instance of AdamState.
468 469 470 471 472 473 474 475 |
# File 'lib/toy/models/transformer.rb', line 468 def initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) @m = Gradients.new(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) @v = Gradients.new(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) @m.fill_zero @v.fill_zero @bc1 = 1.0 # β1^0 @bc2 = 1.0 # β2^0 end |
Instance Attribute Details
#bc1 ⇒ Object
Returns the value of attribute bc1.
466 467 468 |
# File 'lib/toy/models/transformer.rb', line 466 def bc1 @bc1 end |
#bc2 ⇒ Object
Returns the value of attribute bc2.
466 467 468 |
# File 'lib/toy/models/transformer.rb', line 466 def bc2 @bc2 end |
#m ⇒ Object
Returns the value of attribute m.
466 467 468 |
# File 'lib/toy/models/transformer.rb', line 466 def m @m end |
#v ⇒ Object
Returns the value of attribute v.
466 467 468 |
# File 'lib/toy/models/transformer.rb', line 466 def v @v end |