Class: AdamState

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/models/transformer.rb

Overview

Adam optimizer state: per-parameter first and second moments (m, v), plus the accumulating bias-correction products bc1=β1ᵗ, bc2=β2ᵗ. We maintain bc1/bc2 as running products (one multiply per step) instead of computing β**t each step (one pow() call) — pure perf choice; Spinel handles ‘Float ** Int` cleanly.

m and v are Gradients-shaped (same per-parameter structure as the accumulator) so we can reuse Gradients#fill_zero to initialize them.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ AdamState

Returns a new instance of AdamState.



468
469
470
471
472
473
474
475
# File 'lib/toy/models/transformer.rb', line 468

def initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length)
  @m = Gradients.new(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length)
  @v = Gradients.new(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length)
  @m.fill_zero
  @v.fill_zero
  @bc1 = 1.0   # β1^0
  @bc2 = 1.0   # β2^0
end

Instance Attribute Details

#bc1Object

Returns the value of attribute bc1.



466
467
468
# File 'lib/toy/models/transformer.rb', line 466

def bc1
  @bc1
end

#bc2Object

Returns the value of attribute bc2.



466
467
468
# File 'lib/toy/models/transformer.rb', line 466

def bc2
  @bc2
end

#mObject

Returns the value of attribute m.



466
467
468
# File 'lib/toy/models/transformer.rb', line 466

def m
  @m
end

#vObject

Returns the value of attribute v.



466
467
468
# File 'lib/toy/models/transformer.rb', line 466

def v
  @v
end