Class: Gradients

Inherits:

Object

Object
Gradients

show all

Defined in:: lib/toy/models/transformer.rb

Overview

Gradients for the whole model. Structurally a mirror of TransformerLM’s parameters: same Block-shaped per-layer grads (we reuse ZeroBlock).

Instance Attribute Summary collapse

#blocks ⇒ Object

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
#loss ⇒ Object

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
#norm_final_gamma ⇒ Object

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
#pos_embed ⇒ Object

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.
#token_embed ⇒ Object

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.

Instance Method Summary collapse

#fill_zero ⇒ Object
#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ Gradients constructor

A new instance of Gradients.

Constructor Details

#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ `Gradients`

Returns a new instance of Gradients.

# File 'lib/toy/models/transformer.rb', line 426

def initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length)
  @token_embed = Mat.new(vocab_size, d_model)
  @pos_embed   = Mat.new(context_length, d_model)
  @norm_final_gamma = Array.new(d_model, 0.0)
  @blocks = [Block.new(d_model, d_head, d_ff, n_heads)]
  li = 1
  while li < n_layers
    @blocks.push(Block.new(d_model, d_head, d_ff, n_heads))
    li += 1
  end
  @loss = 0.0
end

Instance Attribute Details

#blocks ⇒ `Object`

Tied embeddings: lm_head shares storage with token_embed (lm_head == token_embed), so there’s a single token_embed gradient that accumulates contributions from BOTH the input embedding lookup and the output (unembed) projection.



423
424
425

# File 'lib/toy/models/transformer.rb', line 423

def blocks
  @blocks
end

#loss ⇒ `Object`



423
424
425

# File 'lib/toy/models/transformer.rb', line 423

def loss
  @loss
end

#norm_final_gamma ⇒ `Object`



423
424
425

# File 'lib/toy/models/transformer.rb', line 423

def norm_final_gamma
  @norm_final_gamma
end

#pos_embed ⇒ `Object`



423
424
425

# File 'lib/toy/models/transformer.rb', line 423

def pos_embed
  @pos_embed
end

#token_embed ⇒ `Object`



423
424
425

# File 'lib/toy/models/transformer.rb', line 423

def token_embed
  @token_embed
end

Instance Method Details

#fill_zero ⇒ `Object`

# File 'lib/toy/models/transformer.rb', line 439

def fill_zero
  @token_embed.fill_zero
  @pos_embed.fill_zero
  n = @norm_final_gamma.length
  i = 0
  while i < n
    @norm_final_gamma[i] = 0.0
    i += 1
  end
  bi = 0
  while bi < @blocks.length
    @blocks[bi].fill_zero
    bi += 1
  end
  @loss = 0.0
end

Class: Gradients

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(vocab_size, d_model, d_ff, n_heads, d_head, n_layers, context_length) ⇒ Gradients

Instance Attribute Details

#blocks ⇒ Object

#loss ⇒ Object

#norm_final_gamma ⇒ Object

#pos_embed ⇒ Object

#token_embed ⇒ Object