Class: Gradients
- Inherits:
-
Object
- Object
- Gradients
- Defined in:
- lib/toy/models/transformer.rb
Overview
Gradients for the whole model. Structurally a mirror of TransformerLM’s parameters: same Block-shaped per-layer grads (we reuse ZeroBlock).
Instance Attribute Summary collapse
-
#blocks ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
-
#loss ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
-
#norm_final_gamma ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
-
#pos_embed ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
-
#token_embed ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
Instance Method Summary collapse
- #fill_zero ⇒ Object
-
#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ Gradients
constructor
A new instance of Gradients.
Constructor Details
#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ Gradients
Returns a new instance of Gradients.
426 427 428 429 430 431 432 433 434 435 436 437 |
# File 'lib/toy/models/transformer.rb', line 426 def initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) @token_embed = Mat.new(vocab_size, d_model) @pos_embed = Mat.new(context_length, d_model) @norm_final_gamma = Array.new(d_model, 0.0) @blocks = [Block.new(d_model, d_head, d_ff, n_heads)] li = 1 while li < n_layers @blocks.push(Block.new(d_model, d_head, d_ff, n_heads)) li += 1 end @loss = 0.0 end |
Instance Attribute Details
#blocks ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
423 424 425 |
# File 'lib/toy/models/transformer.rb', line 423 def blocks @blocks end |
#loss ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
423 424 425 |
# File 'lib/toy/models/transformer.rb', line 423 def loss @loss end |
#norm_final_gamma ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
423 424 425 |
# File 'lib/toy/models/transformer.rb', line 423 def norm_final_gamma @norm_final_gamma end |
#pos_embed ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
423 424 425 |
# File 'lib/toy/models/transformer.rb', line 423 def @pos_embed end |
#token_embed ⇒ Object
Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
423 424 425 |
# File 'lib/toy/models/transformer.rb', line 423 def @token_embed end |
Instance Method Details
#fill_zero ⇒ Object
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 |
# File 'lib/toy/models/transformer.rb', line 439 def fill_zero @token_embed.fill_zero @pos_embed.fill_zero n = @norm_final_gamma.length i = 0 while i < n @norm_final_gamma[i] = 0.0 i += 1 end bi = 0 while bi < @blocks.length @blocks[bi].fill_zero bi += 1 end @loss = 0.0 end |