Module: Toy

Defined in:
lib/toy/mri.rb,
lib/toy.rb,
lib/toy/compute.rb,
lib/toy/version.rb,
lib/toy/core/cli.rb,
lib/toy/llm/adamw.rb,
lib/toy/llm/labels.rb,
lib/toy/core/config.rb,
lib/toy/compute_cuda.rb,
lib/toy/core/cli/new.rb,
lib/toy/core/run_log.rb,
lib/toy/dev/toy_card.rb,
lib/toy/ffi_manifest.rb,
lib/toy/compute_metal.rb,
lib/toy/core/cli/eval.rb,
lib/toy/core/cli/list.rb,
lib/toy/core/toy_root.rb,
lib/toy/io/run_bundle.rb,
lib/toy/io/toy_events.rb,
lib/toy/core/cli/fetch.rb,
lib/toy/core/cli/infer.rb,
lib/toy/core/cli/serve.rb,
lib/toy/core/cli/train.rb,
lib/toy/core/gguf_meta.rb,
lib/toy/core/model_scan.rb,
lib/toy/models/toy_gpt2.rb,
lib/toy/core/cli/install.rb,
lib/toy/llm/recipes/lora.rb,
lib/toy/core/cli/describe.rb,
lib/toy/core/cli/manifest.rb,
lib/toy/train/toy_trainer.rb,
lib/toy/llm/classify_batch.rb,
lib/toy/llm/primitives/gqa.rb,
lib/toy/llm/recipe_options.rb,
lib/toy/llm/training_batch.rb,
lib/toy/models/toy_smollm2.rb,
lib/toy/core/cli/exit_codes.rb,
lib/toy/llm/primitives/rope.rb,
lib/toy/llm/archs/llama_arch.rb,
lib/toy/llm/recipes/vit_tiny.rb,
lib/toy/llm/primitives/swiglu.rb,
lib/toy/llm/recipes/lora_cuda.rb,
lib/toy/llm/recipes/lora_metal.rb,
lib/toy/llm/recipes/warm_start.rb,
lib/toy/llm/primitives/gqa_cuda.rb,
lib/toy/llm/primitives/rms_norm.rb,
lib/toy/llm/primitives/gqa_metal.rb,
lib/toy/llm/primitives/rope_cuda.rb,
lib/toy/llm/recipes/from_scratch.rb,
lib/toy/llm/archs/llama_arch_cuda.rb,
lib/toy/llm/primitives/rope_metal.rb,
lib/toy/llm/archs/llama_arch_metal.rb,
lib/toy/llm/engine/gpt2_seq_engine.rb,
lib/toy/llm/engine/vit_tiny_engine.rb,
lib/toy/llm/primitives/swiglu_cuda.rb,
lib/toy/llm/engine/llama_seq_engine.rb,
lib/toy/llm/primitives/swiglu_metal.rb,
lib/toy/llm/recipes/warm_start_cuda.rb,
lib/toy/llm/blocks/transformer_block.rb,
lib/toy/llm/primitives/rms_norm_cuda.rb,
lib/toy/llm/recipes/warm_start_metal.rb,
lib/toy/llm/primitives/rms_norm_metal.rb,
lib/toy/llm/recipes/from_scratch_cuda.rb,
lib/toy/llm/recipes/from_scratch_metal.rb,
lib/toy/llm/engine/gpt2_seq_engine_cuda.rb,
lib/toy/llm/engine/gpt2_seq_engine_metal.rb,
lib/toy/llm/engine/llama_seq_engine_cuda.rb,
lib/toy/llm/blocks/transformer_block_cuda.rb,
lib/toy/llm/engine/llama_seq_engine_metal.rb,
lib/toy/llm/blocks/transformer_block_metal.rb

Overview

lib/toy/llm/blocks/transformer_block_metal.rb — Metal mirror of lib/toy/llm/blocks/transformer_block.rb.

AUTO-GENERATED by prep/gen_cuda_mirror.rb. Do not edit by hand; edit the CPU source and re-run the generator. Same L2 contract on the GPU backend via TinyNNMetal. (RMSNorm + GQA attention with RoPE + SwiGLU FFN), seq-mode forward.

Extracted from lib/llama_seq_forward_ffi.rb (P2.4). This is the minimal faithful lift of the former LlamaSeqBlockFFI class + build_seq_block / build_seq_qhead / mp_matmul: the forward body is moved VERBATIM (op order unchanged → bit-identical output) with only mechanical rewrites — the per-forward context the body previously read off the cache as @ivars is now passed IN via a positional ctx object (TransformerBlockCtx), and the block owns its own weight handles (former blk.* are now self.*).

DIVERGENCE from the L2 README sketch: the README shows a forward- looking build_forward(sess, x, state, cfg) -> [out, state_out] with a KV-cache “state” threaded per block. That incremental KV-decode does NOT exist in seq mode — full-sequence forward threads NO per-block KV state (KV-cache decode is the separate lib/toy_smollm2_ffi_kv.rb path, out of scope here). We adapt to build_forward(sess, t_x, ctx) -> t_resid (single handle). We also keep the owned-weight field names (t_seq_*) verbatim rather than renaming to the sketch’s short names, so the cache-side realize / train / tap walkers keep working by accessor name with no parity risk.

Spinel hygiene: TransformerBlockCtx is a plain class with an explicit positional initialize (NO kwargs, NO default args — default-arg poisoning, landmine #4); TransformerBlock#initialize takes NO args and has NO default-arg ctor; no Card / step_bind / FFI :str args at class load (step_bind :str landmine 2026-05-28 — ft_name_last’s tnn_tensor_set_name :str stays on the cache realize runtime path, not here). The IntArray ptr-array params (t_k_per_kv, t_vt_per_kv) keep their trailing positional slots so Spinel’s locked IntArray param typing (#688) does not shift.

This file does NOT ‘require_relative “tinynn”`: the loading module (lib/llama_seq_forward_ffi.rb) already loads the correct backend’s TinyNNMetal before requiring this block, exactly like the L1 primitives. The mirror generator picks the backend via the monolith’s require rewrite.

Defined Under Namespace

Modules: Core, Device, Events, FFIManifest, LLM, Labels, MRI Classes: AdamW, Card, CardHyper, CardItem, CausalSelfAttention, Embedding, FFN, GPT2, GPT2Block, GPT2Config, GQAttention, LayerNorm, Linear, RMSNorm, RoPE, RopeScaling, RunBundle, RunLog, SmolLM2, SmolLM2Block, SmolLM2Config, Step, SwiGLU, Trainer

Constant Summary collapse

VERSION =

Single source of truth: gemspec + ‘toy –version` + `toy –manifest` all read this; README/CHANGELOG/git tag display it as v0.8.0. v0.8.0 (2026-06-12) is the first PUBLISHED version (RubyGems). Pre-1.0: not API-stable.

"0.8.0".freeze

Class Method Summary collapse

Class Method Details

.add_bias!(x, b) ⇒ Object

x[i, j] += b, in-place.



802
803
804
805
806
807
808
809
810
811
812
813
814
# File 'lib/toy.rb', line 802

def self.add_bias!(x, b)
  t = x.nrows
  d = x.ncols
  i = 0
  while i < t
    j = 0
    while j < d
      x.flat[i * d + j] = x.flat[i * d + j] + b[j]
      j += 1
    end
    i += 1
  end
end

.causal_mask!(scores) ⇒ Object

scores[i, j] = -inf for j > i. In-place.



817
818
819
820
821
822
823
824
825
826
827
828
829
# File 'lib/toy.rb', line 817

def self.causal_mask!(scores)
  t = scores.nrows
  n = scores.ncols
  i = 0
  while i < t
    j = i + 1
    while j < n
      scores.flat[i * n + j] = NEG_INF_SCORE
      j += 1
    end
    i += 1
  end
end

.fmt_count(n) ⇒ Object

Pretty-format a parameter count: 49,152 → “49.2K”; 1_233_000 → “1.2M”.



777
778
779
780
781
782
783
784
785
786
787
# File 'lib/toy.rb', line 777

def self.fmt_count(n)
  if n >= 1_000_000_000
    (n.to_f / 1_000_000_000.0).round(2).to_s + "B"
  elsif n >= 1_000_000
    (n.to_f / 1_000_000.0).round(2).to_s + "M"
  elsif n >= 1_000
    (n.to_f / 1_000.0).round(1).to_s + "K"
  else
    n.to_s
  end
end

.gelu_new(x) ⇒ Object

GeLU (tanh approximation — HF’s gelu_new). Returns a fresh Mat. See lib/transformer.rb for GELU_C / GELU_K / GELU_DK constants used here.



867
868
869
870
871
872
873
874
875
876
877
878
# File 'lib/toy.rb', line 867

def self.gelu_new(x)
  n   = x.nrows * x.ncols
  out = Mat.new(x.nrows, x.ncols)
  i = 0
  while i < n
    v = x.flat[i]
    u = GELU_C * (v + GELU_K * v * v * v)
    out.flat[i] = 0.5 * v * (1.0 + Math.tanh(u))
    i += 1
  end
  out
end

.hadamard!(dst, src) ⇒ Object

Elementwise multiply, into ‘dst` (dst := dst * src). Both have identical shape. Param names avoid `a` / `b` to dodge a Spinel collapse with TinyNN.matmul(a, b) and friends.



792
793
794
795
796
797
798
799
# File 'lib/toy.rb', line 792

def self.hadamard!(dst, src)
  n = dst.nrows * dst.ncols
  i = 0
  while i < n
    dst.flat[i] = dst.flat[i] * src.flat[i]
    i += 1
  end
end

.hstack_heads(per_head, n_heads, d_head, d_model) ⇒ Object

Pack n_heads × [T, Dh] back into a single [T, D] matrix where head h occupies columns [h*Dh, (h+1)*Dh).



882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
# File 'lib/toy.rb', line 882

def self.hstack_heads(per_head, n_heads, d_head, d_model)
  t   = per_head[0].nrows
  out = Mat.new(t, d_model)
  h = 0
  while h < n_heads
    head = per_head[h]
    base = h * d_head
    i = 0
    while i < t
      j = 0
      while j < d_head
        out.flat[i * d_model + (base + j)] = head.flat[i * d_head + j]
        j += 1
      end
      i += 1
    end
    h += 1
  end
  out
end

.silu!(m) ⇒ Object

SiLU activation, in-place. silu(x) = x / (1 + exp(-x)).



743
744
745
746
747
748
749
750
751
# File 'lib/toy.rb', line 743

def self.silu!(m)
  n = m.nrows * m.ncols
  i = 0
  while i < n
    v = m.flat[i]
    m.flat[i] = v / (1.0 + Math.exp(-v))
    i += 1
  end
end

.softmax_rows!(m) ⇒ Object

Row-wise softmax, in-place, numerically stable (max-shift).



832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
# File 'lib/toy.rb', line 832

def self.softmax_rows!(m)
  t = m.nrows
  n = m.ncols
  i = 0
  while i < t
    base = i * n
    mx = m.flat[base]
    j = 1
    while j < n
      v = m.flat[base + j]
      if v > mx
        mx = v
      end
      j += 1
    end
    sum = 0.0
    j = 0
    while j < n
      e = Math.exp(m.flat[base + j] - mx)
      m.flat[base + j] = e
      sum = sum + e
      j += 1
    end
    j = 0
    while j < n
      m.flat[base + j] = m.flat[base + j] / sum
      j += 1
    end
    i += 1
  end
end

.tap(label, x) ⇒ Object

Print a labelled shape line and return the Mat unchanged. Useful to drop in the middle of a forward pass:

x = Toy.tap("after attn", @attn.forward(@ln1.forward(x)))

(Implemented with separate puts/print calls to dodge a Spinel quirk where chained String + Mat#shape concat fails to compile.)



760
761
762
763
764
765
# File 'lib/toy.rb', line 760

def self.tap(label, x)
  print label
  print ": Mat"
  puts x.shape
  x
end

.tap_info(label, x) ⇒ Object

Like ‘tap` but with min/max/mean stats — for “is this drifting?” sanity checks during inference.



769
770
771
772
773
774
# File 'lib/toy.rb', line 769

def self.tap_info(label, x)
  print label
  print ": "
  puts x.info
  x
end