Class: FFNFFICache
- Inherits:
-
Object
- Object
- FFNFFICache
- Defined in:
- lib/toy/ffi/tinynn.rb
Overview
Persistent FFI cache for one transformer block’s FFN. Single ggml session holding the full chain ‘matmul -> gelu -> matmul`. Activations stay inside ggml between the two matmuls; only the three outputs (pre, hidden, out) are downloaded at the end.
Lazy-realized: T (sequence length) isn’t known until the first forward call. realize_for(t_seq, d_model, d_ff) sets up the graph; subsequent calls with the same T reuse it.
Operand layout: we feed matmul1 as ‘matmul(t_w1_t, t_h)` so its result has ne0=d_ff – which is the k-dim of matmul2 – so the chain doesn’t need an intermediate transpose. Downloads of all three result tensors are then a straight row-major memcpy.
Instance Attribute Summary collapse
-
#d_ff ⇒ Object
Returns the value of attribute d_ff.
-
#d_model ⇒ Object
Returns the value of attribute d_model.
-
#realized ⇒ Object
Returns the value of attribute realized.
-
#sess ⇒ Object
Returns the value of attribute sess.
-
#t_h ⇒ Object
Returns the value of attribute t_h.
-
#t_hidden ⇒ Object
Returns the value of attribute t_hidden.
-
#t_out ⇒ Object
Returns the value of attribute t_out.
-
#t_pre ⇒ Object
Returns the value of attribute t_pre.
-
#t_seq ⇒ Object
Returns the value of attribute t_seq.
-
#t_w1_t ⇒ Object
Returns the value of attribute t_w1_t.
-
#t_w2_t ⇒ Object
Returns the value of attribute t_w2_t.
Instance Method Summary collapse
-
#initialize ⇒ FFNFFICache
constructor
A new instance of FFNFFICache.
- #realize_for(t_seq, d_model, d_ff) ⇒ Object
Constructor Details
#initialize ⇒ FFNFFICache
Returns a new instance of FFNFFICache.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/toy/ffi/tinynn.rb', line 35 def initialize @realized = false @t_seq = 0 @d_model = 0 @d_ff = 0 # `:ptr` ivars seed with TinyNN.tnn_null_ptr (a typed NULL `void *`) # rather than `nil`. Post-spinel `85a4670`, mixing `nil` with `:ptr` # boxes the ivar as `sp_RbVal`, which then fails the `(void *)` cast # at every FFI call site downstream. The typed-NULL seed keeps the # ivar as plain `void *` end-to-end. @sess = TinyNN.tnn_null_ptr @t_h = TinyNN.tnn_null_ptr @t_w1_t = TinyNN.tnn_null_ptr @t_w2_t = TinyNN.tnn_null_ptr @t_pre = TinyNN.tnn_null_ptr @t_hidden = TinyNN.tnn_null_ptr @t_out = TinyNN.tnn_null_ptr end |
Instance Attribute Details
#d_ff ⇒ Object
Returns the value of attribute d_ff.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def d_ff @d_ff end |
#d_model ⇒ Object
Returns the value of attribute d_model.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def d_model @d_model end |
#realized ⇒ Object
Returns the value of attribute realized.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def realized @realized end |
#sess ⇒ Object
Returns the value of attribute sess.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def sess @sess end |
#t_h ⇒ Object
Returns the value of attribute t_h.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_h @t_h end |
#t_hidden ⇒ Object
Returns the value of attribute t_hidden.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_hidden @t_hidden end |
#t_out ⇒ Object
Returns the value of attribute t_out.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_out @t_out end |
#t_pre ⇒ Object
Returns the value of attribute t_pre.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_pre @t_pre end |
#t_seq ⇒ Object
Returns the value of attribute t_seq.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_seq @t_seq end |
#t_w1_t ⇒ Object
Returns the value of attribute t_w1_t.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_w1_t @t_w1_t end |
#t_w2_t ⇒ Object
Returns the value of attribute t_w2_t.
31 32 33 |
# File 'lib/toy/ffi/tinynn.rb', line 31 def t_w2_t @t_w2_t end |
Instance Method Details
#realize_for(t_seq, d_model, d_ff) ⇒ Object
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/toy/ffi/tinynn.rb', line 54 def realize_for(t_seq, d_model, d_ff) @t_seq = t_seq @d_model = d_model @d_ff = d_ff @sess = TinyNN.tnn_session_new(0) # t_h: ne=[d_model, T] -- h uploaded row-major (data[k] = h.flat[k]). # t_w1_t: ne=[d_model, d_ff] -- w1 uploaded transposed. # t_w2_t: ne=[d_ff, d_model] -- w2 uploaded transposed. @t_h = TinyNN.tnn_input_2d_f32(@sess, t_seq, d_model) @t_w1_t = TinyNN.tnn_input_2d_f32(@sess, d_ff, d_model) @t_w2_t = TinyNN.tnn_input_2d_f32(@sess, d_model, d_ff) # Chain: mul_mat(w1_t, h) -> gelu -> mul_mat(w2_t, hidden). # Result shapes (ggml ne): [d_ff, T] -> [d_ff, T] -> [d_model, T]. @t_pre = TinyNN.tnn_matmul(@sess, @t_w1_t, @t_h) @t_hidden = TinyNN.tnn_gelu(@sess, @t_pre) @t_out = TinyNN.tnn_matmul(@sess, @t_w2_t, @t_hidden) # Mark intermediates as outputs so the scheduler doesn't alias # their buffers with later ops -- backward needs pre and hidden. TinyNN.tnn_set_output(@t_pre) TinyNN.tnn_set_output(@t_hidden) TinyNN.tnn_set_output(@t_out) TinyNN.tnn_realize(@sess, @t_out) @realized = true end |