Class: Ignis::AI::Transformer::Model

Inherits:
NN::Module show all
Defined in:
lib/nnw/ai/transformer/model.rb

Overview

Full Transformer language model.

token_embedding → position_embedding → N × Block → LayerNorm → LM head

Factory methods provide standard model configurations:

.gpt2_small  → 124M params
.gpt2_medium → 345M params
.gpt2_large  → 774M params

Instance Attribute Summary collapse

Attributes inherited from NN::Module

#training

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from NN::Module

#call, #eval!, #load_state_dict, #named_parameters, #num_parameters, #parameters, #state_dict, #to, #train!, #zero_grad!

Constructor Details

#initialize(vocab_size:, embed_dim:, num_heads:, num_layers:, ff_dim:, max_seq_len:, dropout: 0.0, activation: :gelu, pre_norm: true, device_id: 0) ⇒ Model

Returns a new instance of Model.

Parameters:

  • vocab_size (Integer)

    vocabulary size

  • embed_dim (Integer)

    model dimension

  • num_heads (Integer)

    attention heads per block

  • num_layers (Integer)

    number of Transformer blocks

  • ff_dim (Integer)

    feed-forward hidden dimension

  • max_seq_len (Integer)

    maximum sequence length

  • dropout (Float) (defaults to: 0.0)
  • activation (Symbol) (defaults to: :gelu)
  • pre_norm (Boolean) (defaults to: true)
  • device_id (Integer) (defaults to: 0)


28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/nnw/ai/transformer/model.rb', line 28

def initialize(vocab_size:, embed_dim:, num_heads:, num_layers:,
               ff_dim:, max_seq_len:, dropout: 0.0,
               activation: :gelu, pre_norm: true, device_id: 0)
  super()
  @vocab_size = vocab_size
  @embed_dim = embed_dim
  @num_heads = num_heads
  @num_layers = num_layers
  @max_seq_len = max_seq_len
  @device_id = device_id

  @token_embedding = register_module("token_embedding",
                      NN::Embedding.new(vocab_size, embed_dim, device_id: device_id))
  @position_embedding = register_module("position_embedding",
                         NN::Embedding.new(max_seq_len, embed_dim, device_id: device_id))

  @blocks = []
  num_layers.times do |i|
    block = Block.new(embed_dim, num_heads, ff_dim,
                      dropout: dropout, pre_norm: pre_norm,
                      activation: activation, device_id: device_id)
    @blocks << register_module("blocks.#{i}", block)
  end

  @norm = register_module("norm", NN::LayerNorm.new(embed_dim, device_id: device_id))
  @head = register_module("head",
           NN::Linear.new(embed_dim, vocab_size, bias: false, device_id: device_id))
  @dropout = register_module("dropout", NN::Dropout.new(p: dropout))
end

Instance Attribute Details

#embed_dimInteger (readonly)

Returns:

  • (Integer)


16
17
18
# File 'lib/nnw/ai/transformer/model.rb', line 16

def embed_dim
  @embed_dim
end

#max_seq_lenInteger (readonly)

Returns:

  • (Integer)


16
17
18
# File 'lib/nnw/ai/transformer/model.rb', line 16

def max_seq_len
  @max_seq_len
end

#num_headsInteger (readonly)

Returns:

  • (Integer)


16
17
18
# File 'lib/nnw/ai/transformer/model.rb', line 16

def num_heads
  @num_heads
end

#num_layersInteger (readonly)

Returns:

  • (Integer)


16
17
18
# File 'lib/nnw/ai/transformer/model.rb', line 16

def num_layers
  @num_layers
end

#vocab_sizeInteger (readonly)

Returns:

  • (Integer)


16
17
18
# File 'lib/nnw/ai/transformer/model.rb', line 16

def vocab_size
  @vocab_size
end

Class Method Details

.gpt2_large(device_id: 0) ⇒ Model

GPT-2 Large: 774M parameters

Parameters:

  • device_id (Integer) (defaults to: 0)

Returns:



165
166
167
168
169
170
171
172
173
174
175
176
177
178
# File 'lib/nnw/ai/transformer/model.rb', line 165

def self.gpt2_large(device_id: 0)
  new(
    vocab_size: 50257,
    embed_dim: 1280,
    num_heads: 20,
    num_layers: 36,
    ff_dim: 5120,
    max_seq_len: 1024,
    dropout: 0.1,
    activation: :gelu,
    pre_norm: true,
    device_id: device_id
  )
end

.gpt2_medium(device_id: 0) ⇒ Model

GPT-2 Medium: 345M parameters

Parameters:

  • device_id (Integer) (defaults to: 0)

Returns:



147
148
149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/nnw/ai/transformer/model.rb', line 147

def self.gpt2_medium(device_id: 0)
  new(
    vocab_size: 50257,
    embed_dim: 1024,
    num_heads: 16,
    num_layers: 24,
    ff_dim: 4096,
    max_seq_len: 1024,
    dropout: 0.1,
    activation: :gelu,
    pre_norm: true,
    device_id: device_id
  )
end

.gpt2_small(device_id: 0) ⇒ Model

GPT-2 Small: 124M parameters

Parameters:

  • device_id (Integer) (defaults to: 0)

Returns:



129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/nnw/ai/transformer/model.rb', line 129

def self.gpt2_small(device_id: 0)
  new(
    vocab_size: 50257,
    embed_dim: 768,
    num_heads: 12,
    num_layers: 12,
    ff_dim: 3072,
    max_seq_len: 1024,
    dropout: 0.1,
    activation: :gelu,
    pre_norm: true,
    device_id: device_id
  )
end

Instance Method Details

#decode_step(token_id, cache) ⇒ Tensor

Incremental forward for ONE new token using a KV cache (decode path). Equivalent to the last-position logits of a full forward over the whole prefix, but O(prefix) instead of O(prefix²): only this token is projected and embedded; its query attends over cached K/V. Must run under Tape.no_grad (no autograd). Append order matches the prefix order, so callers feed the prompt token-by-token before sampling.

Parameters:

  • token_id (Integer)

    the new token’s id

  • cache (KVCache)

Returns:

  • (Tensor)

    logits [1, vocab]



107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/nnw/ai/transformer/model.rb', line 107

def decode_step(token_id, cache)
  pos = cache.length
  raise "KVCache full: position #{pos} exceeds max_seq_len #{@max_seq_len}" if pos >= @max_seq_len

  tok = Tensor.from_host([token_id], shape: [1], dtype: :int32, device_id: @device_id)
  pos_t = Tensor.from_host([pos], shape: [1], dtype: :int32, device_id: @device_id)

  x = @token_embedding.call(tok) + @position_embedding.call(pos_t) # [1, embed]
  @blocks.each_with_index { |block, i| x = block.decode_step(x, cache, i) }
  cache.advance!

  x = @norm.call(x)
  @head.call(x) # [1, vocab]
end

#forward(input_ids, mask: nil) ⇒ Tensor

Forward pass: returns logits.

Parameters:

  • input_ids (Tensor)

    token indices [batch_size, seq_len] (int32)

  • mask (Tensor, nil) (defaults to: nil)

    attention mask

Returns:

  • (Tensor)

    logits [batch_size * seq_len, vocab_size]



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# File 'lib/nnw/ai/transformer/model.rb', line 62

def forward(input_ids, mask: nil)
  seq_len = input_ids.shape[-1]

  # Create position indices
  positions_data = (0...seq_len).to_a
  pos_nv = Ignis::Shared::NvArray.new(shape: [seq_len], dtype: :int32,
                                     device_id: input_ids.device_id)
  pos_nv.from_host(positions_data)
  positions = Tensor.new(data: pos_nv, requires_grad: false)

  # Embeddings
  tok_emb = @token_embedding.call(input_ids)   # [batch, seq, embed]
  pos_emb = @position_embedding.call(positions) # [seq, embed]

  # Combine and dropout
  x = tok_emb + pos_emb
  x = @dropout.call(x)

  # Transformer blocks
  @blocks.each do |block|
    x = block.call(x, mask: mask)
  end

  # Final norm and LM head
  x = @norm.call(x)
  @head.call(x)  # → logits [batch*seq, vocab]
end

#make_kv_cache(device_id: @device_id) ⇒ KVCache

Allocate a fresh KV cache sized for this model.

Parameters:

  • device_id (Integer) (defaults to: @device_id)

Returns:



93
94
95
96
# File 'lib/nnw/ai/transformer/model.rb', line 93

def make_kv_cache(device_id: @device_id)
  KVCache.new(num_layers: @num_layers, max_seq_len: @max_seq_len,
              embed_dim: @embed_dim, device_id: device_id)
end

#to_sString

Returns:

  • (String)


181
182
183
184
185
# File 'lib/nnw/ai/transformer/model.rb', line 181

def to_s
  "TransformerModel(vocab=#{@vocab_size}, embed=#{@embed_dim}, " \
  "heads=#{@num_heads}, layers=#{@num_layers}, " \
  "params=#{num_parameters})"
end