Class: Gradients

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/models/transformer.rb

Overview

Gradients for the whole model. Structurally a mirror of TransformerLM’s parameters: same Block-shaped per-layer grads (we reuse ZeroBlock).

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ Gradients

Returns a new instance of Gradients.



426
427
428
429
430
431
432
433
434
435
436
437
# File 'lib/toy/models/transformer.rb', line 426

def initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length)
  @token_embed = Mat.new(vocab_size, d_model)
  @pos_embed   = Mat.new(context_length, d_model)
  @norm_final_gamma = Array.new(d_model, 0.0)
  @blocks = [Block.new(d_model, d_head, d_ff, n_heads)]
  li = 1
  while li < n_layers
    @blocks.push(Block.new(d_model, d_head, d_ff, n_heads))
    li += 1
  end
  @loss = 0.0
end

Instance Attribute Details

#blocksObject

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.



423
424
425
# File 'lib/toy/models/transformer.rb', line 423

def blocks
  @blocks
end

#lossObject

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.



423
424
425
# File 'lib/toy/models/transformer.rb', line 423

def loss
  @loss
end

#norm_final_gammaObject

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.



423
424
425
# File 'lib/toy/models/transformer.rb', line 423

def norm_final_gamma
  @norm_final_gamma
end

#pos_embedObject

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.



423
424
425
# File 'lib/toy/models/transformer.rb', line 423

def pos_embed
  @pos_embed
end

#token_embedObject

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.



423
424
425
# File 'lib/toy/models/transformer.rb', line 423

def token_embed
  @token_embed
end

Instance Method Details

#fill_zeroObject



439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
# File 'lib/toy/models/transformer.rb', line 439

def fill_zero
  @token_embed.fill_zero
  @pos_embed.fill_zero
  n = @norm_final_gamma.length
  i = 0
  while i < n
    @norm_final_gamma[i] = 0.0
    i += 1
  end
  bi = 0
  while bi < @blocks.length
    @blocks[bi].fill_zero
    bi += 1
  end
  @loss = 0.0
end