Class: Ignis::AI::KVCache
- Inherits:
-
Object
- Object
- Ignis::AI::KVCache
- Defined in:
- lib/nnw/ai/kv_cache.rb
Overview
KVCache — per-layer key/value cache for O(1)-prefix autoregressive decoding.
Without a cache, generating token t re-runs attention over the entire prefix (O(t) work that grows every step → O(n²) generation). The cache stores each layer’s projected K and V once; each new token only projects ITS key/value, appends them, and attends its single query over all cached keys.
Buffers are preallocated to [max_seq_len, embed_dim] per layer so appending a row is an O(row) device→device copy (NvArray#write_rows!) rather than a realloc + full recopy. A contiguous [length+1, embed] view (slice along dim 0) exposes the live region — including the row just appended this step.
Instance Attribute Summary collapse
- #embed_dim ⇒ Integer readonly
-
#length ⇒ Integer
readonly
Number of tokens currently cached (positions 0..length-1).
- #max_seq_len ⇒ Integer readonly
- #num_layers ⇒ Integer readonly
Instance Method Summary collapse
-
#advance! ⇒ void
Advance to the next position after every layer has appended this token.
-
#append(layer, k_row, v_row) ⇒ void
Append a layer’s K/V for the current token at row
length. -
#full? ⇒ Boolean
True if the cache is at capacity (no more tokens fit).
-
#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ KVCache
constructor
A new instance of KVCache.
-
#k_view(layer) ⇒ Ignis::Shared::NvArray
Contiguous [length+1, embed] view of cached keys for
layer, INCLUDING the row just appended this step. -
#v_view(layer) ⇒ Ignis::Shared::NvArray
Non-owning view of cached values.
Constructor Details
#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ KVCache
Returns a new instance of KVCache.
26 27 28 29 30 31 32 33 34 |
# File 'lib/nnw/ai/kv_cache.rb', line 26 def initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) @num_layers = num_layers @max_seq_len = max_seq_len @embed_dim = @device_id = device_id @length = 0 @k = Array.new(num_layers) { make_buffer } @v = Array.new(num_layers) { make_buffer } end |
Instance Attribute Details
#embed_dim ⇒ Integer (readonly)
20 21 22 |
# File 'lib/nnw/ai/kv_cache.rb', line 20 def @embed_dim end |
#length ⇒ Integer (readonly)
Returns number of tokens currently cached (positions 0..length-1).
18 19 20 |
# File 'lib/nnw/ai/kv_cache.rb', line 18 def length @length end |
#max_seq_len ⇒ Integer (readonly)
20 21 22 |
# File 'lib/nnw/ai/kv_cache.rb', line 20 def max_seq_len @max_seq_len end |
#num_layers ⇒ Integer (readonly)
20 21 22 |
# File 'lib/nnw/ai/kv_cache.rb', line 20 def num_layers @num_layers end |
Instance Method Details
#advance! ⇒ void
This method returns an undefined value.
Advance to the next position after every layer has appended this token.
62 63 64 |
# File 'lib/nnw/ai/kv_cache.rb', line 62 def advance! @length += 1 end |
#append(layer, k_row, v_row) ⇒ void
This method returns an undefined value.
Append a layer’s K/V for the current token at row length.
41 42 43 44 |
# File 'lib/nnw/ai/kv_cache.rb', line 41 def append(layer, k_row, v_row) @k[layer].write_rows!(k_row, @length) @v[layer].write_rows!(v_row, @length) end |
#full? ⇒ Boolean
Returns true if the cache is at capacity (no more tokens fit).
67 68 69 |
# File 'lib/nnw/ai/kv_cache.rb', line 67 def full? @length >= @max_seq_len end |
#k_view(layer) ⇒ Ignis::Shared::NvArray
Contiguous [length+1, embed] view of cached keys for layer, INCLUDING the row just appended this step. (Call after #append, before #advance!.)
50 51 52 |
# File 'lib/nnw/ai/kv_cache.rb', line 50 def k_view(layer) @k[layer].slice(0, 0, @length + 1) end |
#v_view(layer) ⇒ Ignis::Shared::NvArray
Returns non-owning view of cached values.
56 57 58 |
# File 'lib/nnw/ai/kv_cache.rb', line 56 def v_view(layer) @v[layer].slice(0, 0, @length + 1) end |