Class: Ignis::AI::KVCache

Inherits:

Object

Object
Ignis::AI::KVCache

show all

Defined in:: lib/nnw/ai/kv_cache.rb

Overview

KVCache — per-layer key/value cache for O(1)-prefix autoregressive decoding.

Without a cache, generating token t re-runs attention over the entire prefix (O(t) work that grows every step → O(n²) generation). The cache stores each layer’s projected K and V once; each new token only projects ITS key/value, appends them, and attends its single query over all cached keys.

Buffers are preallocated to [max_seq_len, embed_dim] per layer so appending a row is an O(row) device→device copy (NvArray#write_rows!) rather than a realloc + full recopy. A contiguous [length+1, embed] view (slice along dim 0) exposes the live region — including the row just appended this step.

Instance Attribute Summary collapse

#embed_dim ⇒ Integer readonly
#length ⇒ Integer readonly

Number of tokens currently cached (positions 0..length-1).
#max_seq_len ⇒ Integer readonly
#num_layers ⇒ Integer readonly

Instance Method Summary collapse

#advance! ⇒ void

Advance to the next position after every layer has appended this token.
#append(layer, k_row, v_row) ⇒ void

Append a layer’s K/V for the current token at row length.
#full? ⇒ Boolean

True if the cache is at capacity (no more tokens fit).
#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ KVCache constructor

A new instance of KVCache.
#k_view(layer) ⇒ Ignis::Shared::NvArray

Contiguous [length+1, embed] view of cached keys for layer, INCLUDING the row just appended this step.
#v_view(layer) ⇒ Ignis::Shared::NvArray

Non-owning view of cached values.

Constructor Details

#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ `KVCache`

Returns a new instance of KVCache.

Parameters:

num_layers (Integer)
max_seq_len (Integer) —

capacity (generation cannot exceed this)
embed_dim (Integer)
device_id (Integer) (defaults to: 0)

# File 'lib/nnw/ai/kv_cache.rb', line 26

def initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0)
  @num_layers = num_layers
  @max_seq_len = max_seq_len
  @embed_dim = embed_dim
  @device_id = device_id
  @length = 0
  @k = Array.new(num_layers) { make_buffer }
  @v = Array.new(num_layers) { make_buffer }
end

Instance Attribute Details

#embed_dim ⇒ `Integer` (readonly)

Returns:

(Integer)



20
21
22

# File 'lib/nnw/ai/kv_cache.rb', line 20

def embed_dim
  @embed_dim
end

#length ⇒ `Integer` (readonly)

Returns number of tokens currently cached (positions 0..length-1).

Returns:

(Integer) —

number of tokens currently cached (positions 0..length-1)



18
19
20

# File 'lib/nnw/ai/kv_cache.rb', line 18

def length
  @length
end

#max_seq_len ⇒ `Integer` (readonly)

Returns:

(Integer)



20
21
22

# File 'lib/nnw/ai/kv_cache.rb', line 20

def max_seq_len
  @max_seq_len
end

#num_layers ⇒ `Integer` (readonly)

Returns:

(Integer)



20
21
22

# File 'lib/nnw/ai/kv_cache.rb', line 20

def num_layers
  @num_layers
end

Instance Method Details

#advance! ⇒ `void`

This method returns an undefined value.

Advance to the next position after every layer has appended this token.



62
63
64

# File 'lib/nnw/ai/kv_cache.rb', line 62

def advance!
  @length += 1
end

#append(layer, k_row, v_row) ⇒ `void`

This method returns an undefined value.

Append a layer’s K/V for the current token at row length.

Parameters:

layer (Integer) —

layer index
k_row (Ignis::Shared::NvArray) —
1, embed

key projection of the new token
v_row (Ignis::Shared::NvArray) —
1, embed

value projection of the new token

# File 'lib/nnw/ai/kv_cache.rb', line 41

def append(layer, k_row, v_row)
  @k[layer].write_rows!(k_row, @length)
  @v[layer].write_rows!(v_row, @length)
end

#full? ⇒ `Boolean`

Returns true if the cache is at capacity (no more tokens fit).

Returns:

(Boolean) —

true if the cache is at capacity (no more tokens fit)



67
68
69

# File 'lib/nnw/ai/kv_cache.rb', line 67

def full?
  @length >= @max_seq_len
end

#k_view(layer) ⇒ `Ignis::Shared::NvArray`

Contiguous [length+1, embed] view of cached keys for layer, INCLUDING the row just appended this step. (Call after #append, before #advance!.)

Parameters:

layer (Integer)

Returns:

(Ignis::Shared::NvArray) —

non-owning view



50
51
52

# File 'lib/nnw/ai/kv_cache.rb', line 50

def k_view(layer)
  @k[layer].slice(0, 0, @length + 1)
end

#v_view(layer) ⇒ `Ignis::Shared::NvArray`

Returns non-owning view of cached values.

Parameters:

layer (Integer)

Returns:

(Ignis::Shared::NvArray) —

non-owning view of cached values



56
57
58

# File 'lib/nnw/ai/kv_cache.rb', line 56

def v_view(layer)
  @v[layer].slice(0, 0, @length + 1)
end

Class: Ignis::AI::KVCache

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ KVCache

Instance Attribute Details

#embed_dim ⇒ Integer (readonly)

#length ⇒ Integer (readonly)

#max_seq_len ⇒ Integer (readonly)

#num_layers ⇒ Integer (readonly)

Instance Method Details

#advance! ⇒ void

#append(layer, k_row, v_row) ⇒ void

#full? ⇒ Boolean

#k_view(layer) ⇒ Ignis::Shared::NvArray

#v_view(layer) ⇒ Ignis::Shared::NvArray

#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ `KVCache`

#embed_dim ⇒ `Integer` (readonly)

#length ⇒ `Integer` (readonly)

#max_seq_len ⇒ `Integer` (readonly)

#num_layers ⇒ `Integer` (readonly)

#advance! ⇒ `void`

#append(layer, k_row, v_row) ⇒ `void`

#full? ⇒ `Boolean`

#k_view(layer) ⇒ `Ignis::Shared::NvArray`

#v_view(layer) ⇒ `Ignis::Shared::NvArray`