Class: FFNFFICache

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/ffi/tinynn.rb

Overview

Persistent FFI cache for one transformer block’s FFN. Single ggml session holding the full chain ‘matmul -> gelu -> matmul`. Activations stay inside ggml between the two matmuls; only the three outputs (pre, hidden, out) are downloaded at the end.

Lazy-realized: T (sequence length) isn’t known until the first forward call. realize_for(t_seq, d_model, d_ff) sets up the graph; subsequent calls with the same T reuse it.

Operand layout: we feed matmul1 as ‘matmul(t_w1_t, t_h)` so its result has ne0=d_ff – which is the k-dim of matmul2 – so the chain doesn’t need an intermediate transpose. Downloads of all three result tensors are then a straight row-major memcpy.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeFFNFFICache

Returns a new instance of FFNFFICache.



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/toy/ffi/tinynn.rb', line 35

def initialize
  @realized = false
  @t_seq    = 0
  @d_model  = 0
  @d_ff     = 0
  # `:ptr` ivars seed with TinyNN.tnn_null_ptr (a typed NULL `void *`)
  # rather than `nil`. Post-spinel `85a4670`, mixing `nil` with `:ptr`
  # boxes the ivar as `sp_RbVal`, which then fails the `(void *)` cast
  # at every FFI call site downstream. The typed-NULL seed keeps the
  # ivar as plain `void *` end-to-end.
  @sess     = TinyNN.tnn_null_ptr
  @t_h      = TinyNN.tnn_null_ptr
  @t_w1_t   = TinyNN.tnn_null_ptr
  @t_w2_t   = TinyNN.tnn_null_ptr
  @t_pre    = TinyNN.tnn_null_ptr
  @t_hidden = TinyNN.tnn_null_ptr
  @t_out    = TinyNN.tnn_null_ptr
end

Instance Attribute Details

#d_ffObject

Returns the value of attribute d_ff.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def d_ff
  @d_ff
end

#d_modelObject

Returns the value of attribute d_model.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def d_model
  @d_model
end

#realizedObject

Returns the value of attribute realized.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def realized
  @realized
end

#sessObject

Returns the value of attribute sess.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def sess
  @sess
end

#t_hObject

Returns the value of attribute t_h.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_h
  @t_h
end

#t_hiddenObject

Returns the value of attribute t_hidden.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_hidden
  @t_hidden
end

#t_outObject

Returns the value of attribute t_out.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_out
  @t_out
end

#t_preObject

Returns the value of attribute t_pre.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_pre
  @t_pre
end

#t_seqObject

Returns the value of attribute t_seq.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_seq
  @t_seq
end

#t_w1_tObject

Returns the value of attribute t_w1_t.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_w1_t
  @t_w1_t
end

#t_w2_tObject

Returns the value of attribute t_w2_t.



31
32
33
# File 'lib/toy/ffi/tinynn.rb', line 31

def t_w2_t
  @t_w2_t
end

Instance Method Details

#realize_for(t_seq, d_model, d_ff) ⇒ Object



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/toy/ffi/tinynn.rb', line 54

def realize_for(t_seq, d_model, d_ff)
  @t_seq   = t_seq
  @d_model = d_model
  @d_ff    = d_ff

  @sess = TinyNN.tnn_session_new(0)
  # t_h:    ne=[d_model, T] -- h uploaded row-major (data[k] = h.flat[k]).
  # t_w1_t: ne=[d_model, d_ff] -- w1 uploaded transposed.
  # t_w2_t: ne=[d_ff, d_model] -- w2 uploaded transposed.
  @t_h    = TinyNN.tnn_input_2d_f32(@sess, t_seq,  d_model)
  @t_w1_t = TinyNN.tnn_input_2d_f32(@sess, d_ff,   d_model)
  @t_w2_t = TinyNN.tnn_input_2d_f32(@sess, d_model, d_ff)

  # Chain: mul_mat(w1_t, h) -> gelu -> mul_mat(w2_t, hidden).
  # Result shapes (ggml ne):  [d_ff, T] -> [d_ff, T] -> [d_model, T].
  @t_pre    = TinyNN.tnn_matmul(@sess, @t_w1_t, @t_h)
  @t_hidden = TinyNN.tnn_gelu(@sess, @t_pre)
  @t_out    = TinyNN.tnn_matmul(@sess, @t_w2_t, @t_hidden)
  # Mark intermediates as outputs so the scheduler doesn't alias
  # their buffers with later ops -- backward needs pre and hidden.
  TinyNN.tnn_set_output(@t_pre)
  TinyNN.tnn_set_output(@t_hidden)
  TinyNN.tnn_set_output(@t_out)
  TinyNN.tnn_realize(@sess, @t_out)

  @realized = true
end