Class: Ignis::AI::KVCache

Inherits:
Object
  • Object
show all
Defined in:
lib/nnw/ai/kv_cache.rb

Overview

KVCache — per-layer key/value cache for O(1)-prefix autoregressive decoding.

Without a cache, generating token t re-runs attention over the entire prefix (O(t) work that grows every step → O(n²) generation). The cache stores each layer’s projected K and V once; each new token only projects ITS key/value, appends them, and attends its single query over all cached keys.

Buffers are preallocated to [max_seq_len, embed_dim] per layer so appending a row is an O(row) device→device copy (NvArray#write_rows!) rather than a realloc + full recopy. A contiguous [length+1, embed] view (slice along dim 0) exposes the live region — including the row just appended this step.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0) ⇒ KVCache

Returns a new instance of KVCache.

Parameters:

  • num_layers (Integer)
  • max_seq_len (Integer)

    capacity (generation cannot exceed this)

  • embed_dim (Integer)
  • device_id (Integer) (defaults to: 0)


26
27
28
29
30
31
32
33
34
# File 'lib/nnw/ai/kv_cache.rb', line 26

def initialize(num_layers:, max_seq_len:, embed_dim:, device_id: 0)
  @num_layers = num_layers
  @max_seq_len = max_seq_len
  @embed_dim = embed_dim
  @device_id = device_id
  @length = 0
  @k = Array.new(num_layers) { make_buffer }
  @v = Array.new(num_layers) { make_buffer }
end

Instance Attribute Details

#embed_dimInteger (readonly)

Returns:

  • (Integer)


20
21
22
# File 'lib/nnw/ai/kv_cache.rb', line 20

def embed_dim
  @embed_dim
end

#lengthInteger (readonly)

Returns number of tokens currently cached (positions 0..length-1).

Returns:

  • (Integer)

    number of tokens currently cached (positions 0..length-1)



18
19
20
# File 'lib/nnw/ai/kv_cache.rb', line 18

def length
  @length
end

#max_seq_lenInteger (readonly)

Returns:

  • (Integer)


20
21
22
# File 'lib/nnw/ai/kv_cache.rb', line 20

def max_seq_len
  @max_seq_len
end

#num_layersInteger (readonly)

Returns:

  • (Integer)


20
21
22
# File 'lib/nnw/ai/kv_cache.rb', line 20

def num_layers
  @num_layers
end

Instance Method Details

#advance!void

This method returns an undefined value.

Advance to the next position after every layer has appended this token.



62
63
64
# File 'lib/nnw/ai/kv_cache.rb', line 62

def advance!
  @length += 1
end

#append(layer, k_row, v_row) ⇒ void

This method returns an undefined value.

Append a layer’s K/V for the current token at row length.

Parameters:

  • layer (Integer)

    layer index

  • k_row (Ignis::Shared::NvArray)
    1, embed

    key projection of the new token

  • v_row (Ignis::Shared::NvArray)
    1, embed

    value projection of the new token



41
42
43
44
# File 'lib/nnw/ai/kv_cache.rb', line 41

def append(layer, k_row, v_row)
  @k[layer].write_rows!(k_row, @length)
  @v[layer].write_rows!(v_row, @length)
end

#full?Boolean

Returns true if the cache is at capacity (no more tokens fit).

Returns:

  • (Boolean)

    true if the cache is at capacity (no more tokens fit)



67
68
69
# File 'lib/nnw/ai/kv_cache.rb', line 67

def full?
  @length >= @max_seq_len
end

#k_view(layer) ⇒ Ignis::Shared::NvArray

Contiguous [length+1, embed] view of cached keys for layer, INCLUDING the row just appended this step. (Call after #append, before #advance!.)

Parameters:

  • layer (Integer)

Returns:

  • (Ignis::Shared::NvArray)

    non-owning view



50
51
52
# File 'lib/nnw/ai/kv_cache.rb', line 50

def k_view(layer)
  @k[layer].slice(0, 0, @length + 1)
end

#v_view(layer) ⇒ Ignis::Shared::NvArray

Returns non-owning view of cached values.

Parameters:

  • layer (Integer)

Returns:

  • (Ignis::Shared::NvArray)

    non-owning view of cached values



56
57
58
# File 'lib/nnw/ai/kv_cache.rb', line 56

def v_view(layer)
  @v[layer].slice(0, 0, @length + 1)
end