Module: Toy
- Defined in:
- lib/toy/mri.rb,
lib/toy.rb,
lib/toy/compute.rb,
lib/toy/version.rb,
lib/toy/core/cli.rb,
lib/toy/llm/adamw.rb,
lib/toy/llm/labels.rb,
lib/toy/core/config.rb,
lib/toy/compute_cuda.rb,
lib/toy/core/cli/new.rb,
lib/toy/core/run_log.rb,
lib/toy/dev/toy_card.rb,
lib/toy/ffi_manifest.rb,
lib/toy/compute_metal.rb,
lib/toy/core/cli/eval.rb,
lib/toy/core/cli/list.rb,
lib/toy/core/toy_root.rb,
lib/toy/io/run_bundle.rb,
lib/toy/io/toy_events.rb,
lib/toy/core/cli/fetch.rb,
lib/toy/core/cli/infer.rb,
lib/toy/core/cli/serve.rb,
lib/toy/core/cli/train.rb,
lib/toy/core/gguf_meta.rb,
lib/toy/core/model_scan.rb,
lib/toy/models/toy_gpt2.rb,
lib/toy/core/cli/install.rb,
lib/toy/llm/recipes/lora.rb,
lib/toy/core/cli/describe.rb,
lib/toy/core/cli/manifest.rb,
lib/toy/train/toy_trainer.rb,
lib/toy/llm/classify_batch.rb,
lib/toy/llm/primitives/gqa.rb,
lib/toy/llm/recipe_options.rb,
lib/toy/llm/training_batch.rb,
lib/toy/models/toy_smollm2.rb,
lib/toy/core/cli/exit_codes.rb,
lib/toy/llm/primitives/rope.rb,
lib/toy/llm/archs/llama_arch.rb,
lib/toy/llm/recipes/vit_tiny.rb,
lib/toy/llm/primitives/swiglu.rb,
lib/toy/llm/recipes/lora_cuda.rb,
lib/toy/llm/recipes/lora_metal.rb,
lib/toy/llm/recipes/warm_start.rb,
lib/toy/llm/primitives/gqa_cuda.rb,
lib/toy/llm/primitives/rms_norm.rb,
lib/toy/llm/primitives/gqa_metal.rb,
lib/toy/llm/primitives/rope_cuda.rb,
lib/toy/llm/recipes/from_scratch.rb,
lib/toy/llm/archs/llama_arch_cuda.rb,
lib/toy/llm/primitives/rope_metal.rb,
lib/toy/llm/archs/llama_arch_metal.rb,
lib/toy/llm/engine/gpt2_seq_engine.rb,
lib/toy/llm/engine/vit_tiny_engine.rb,
lib/toy/llm/primitives/swiglu_cuda.rb,
lib/toy/llm/engine/llama_seq_engine.rb,
lib/toy/llm/primitives/swiglu_metal.rb,
lib/toy/llm/recipes/warm_start_cuda.rb,
lib/toy/llm/blocks/transformer_block.rb,
lib/toy/llm/primitives/rms_norm_cuda.rb,
lib/toy/llm/recipes/warm_start_metal.rb,
lib/toy/llm/primitives/rms_norm_metal.rb,
lib/toy/llm/recipes/from_scratch_cuda.rb,
lib/toy/llm/recipes/from_scratch_metal.rb,
lib/toy/llm/engine/gpt2_seq_engine_cuda.rb,
lib/toy/llm/engine/gpt2_seq_engine_metal.rb,
lib/toy/llm/engine/llama_seq_engine_cuda.rb,
lib/toy/llm/blocks/transformer_block_cuda.rb,
lib/toy/llm/engine/llama_seq_engine_metal.rb,
lib/toy/llm/blocks/transformer_block_metal.rb
Overview
lib/toy/llm/blocks/transformer_block_metal.rb — Metal mirror of lib/toy/llm/blocks/transformer_block.rb.
AUTO-GENERATED by prep/gen_cuda_mirror.rb. Do not edit by hand; edit the CPU source and re-run the generator. Same L2 contract on the GPU backend via TinyNNMetal. (RMSNorm + GQA attention with RoPE + SwiGLU FFN), seq-mode forward.
Extracted from lib/llama_seq_forward_ffi.rb (P2.4). This is the minimal faithful lift of the former LlamaSeqBlockFFI class + build_seq_block / build_seq_qhead / mp_matmul: the forward body is moved VERBATIM (op order unchanged → bit-identical output) with only mechanical rewrites — the per-forward context the body previously read off the cache as @ivars is now passed IN via a positional ctx object (TransformerBlockCtx), and the block owns its own weight handles (former blk.* are now self.*).
DIVERGENCE from the L2 README sketch: the README shows a forward- looking build_forward(sess, x, state, cfg) -> [out, state_out] with a KV-cache “state” threaded per block. That incremental KV-decode does NOT exist in seq mode — full-sequence forward threads NO per-block KV state (KV-cache decode is the separate lib/toy_smollm2_ffi_kv.rb path, out of scope here). We adapt to build_forward(sess, t_x, ctx) -> t_resid (single handle). We also keep the owned-weight field names (t_seq_*) verbatim rather than renaming to the sketch’s short names, so the cache-side realize / train / tap walkers keep working by accessor name with no parity risk.
Spinel hygiene: TransformerBlockCtx is a plain class with an explicit positional initialize (NO kwargs, NO default args — default-arg poisoning, landmine #4); TransformerBlock#initialize takes NO args and has NO default-arg ctor; no Card / step_bind / FFI :str args at class load (step_bind :str landmine 2026-05-28 — ft_name_last’s tnn_tensor_set_name :str stays on the cache realize runtime path, not here). The IntArray ptr-array params (t_k_per_kv, t_vt_per_kv) keep their trailing positional slots so Spinel’s locked IntArray param typing (#688) does not shift.
This file does NOT ‘require_relative “tinynn”`: the loading module (lib/llama_seq_forward_ffi.rb) already loads the correct backend’s TinyNNMetal before requiring this block, exactly like the L1 primitives. The mirror generator picks the backend via the monolith’s require rewrite.
Defined Under Namespace
Modules: Core, Device, Events, FFIManifest, LLM, Labels, MRI Classes: AdamW, Card, CardHyper, CardItem, CausalSelfAttention, Embedding, FFN, GPT2, GPT2Block, GPT2Config, GQAttention, LayerNorm, Linear, RMSNorm, RoPE, RopeScaling, RunBundle, RunLog, SmolLM2, SmolLM2Block, SmolLM2Config, Step, SwiGLU, Trainer
Constant Summary collapse
- VERSION =
Single source of truth: gemspec + ‘toy –version` + `toy –manifest` all read this; README/CHANGELOG/git tag display it as v0.8.0. v0.8.0 (2026-06-12) is the first PUBLISHED version (RubyGems). Pre-1.0: not API-stable.
"0.8.0".freeze
Class Method Summary collapse
-
.add_bias!(x, b) ⇒ Object
x[i, j] += b, in-place.
-
.causal_mask!(scores) ⇒ Object
scores[i, j] = -inf for j > i.
-
.fmt_count(n) ⇒ Object
Pretty-format a parameter count: 49,152 → “49.2K”; 1_233_000 → “1.2M”.
-
.gelu_new(x) ⇒ Object
GeLU (tanh approximation — HF’s gelu_new).
-
.hadamard!(dst, src) ⇒ Object
Elementwise multiply, into ‘dst` (dst := dst * src).
-
.hstack_heads(per_head, n_heads, d_head, d_model) ⇒ Object
Pack n_heads × [T, Dh] back into a single [T, D] matrix where head h occupies columns [h*Dh, (h+1)*Dh).
-
.silu!(m) ⇒ Object
SiLU activation, in-place.
-
.softmax_rows!(m) ⇒ Object
Row-wise softmax, in-place, numerically stable (max-shift).
-
.tap(label, x) ⇒ Object
Print a labelled shape line and return the Mat unchanged.
-
.tap_info(label, x) ⇒ Object
Like ‘tap` but with min/max/mean stats — for “is this drifting?” sanity checks during inference.
Class Method Details
.add_bias!(x, b) ⇒ Object
x[i, j] += b, in-place.
802 803 804 805 806 807 808 809 810 811 812 813 814 |
# File 'lib/toy.rb', line 802 def self.add_bias!(x, b) t = x.nrows d = x.ncols i = 0 while i < t j = 0 while j < d x.flat[i * d + j] = x.flat[i * d + j] + b[j] j += 1 end i += 1 end end |
.causal_mask!(scores) ⇒ Object
scores[i, j] = -inf for j > i. In-place.
817 818 819 820 821 822 823 824 825 826 827 828 829 |
# File 'lib/toy.rb', line 817 def self.causal_mask!(scores) t = scores.nrows n = scores.ncols i = 0 while i < t j = i + 1 while j < n scores.flat[i * n + j] = NEG_INF_SCORE j += 1 end i += 1 end end |
.fmt_count(n) ⇒ Object
Pretty-format a parameter count: 49,152 → “49.2K”; 1_233_000 → “1.2M”.
777 778 779 780 781 782 783 784 785 786 787 |
# File 'lib/toy.rb', line 777 def self.fmt_count(n) if n >= 1_000_000_000 (n.to_f / 1_000_000_000.0).round(2).to_s + "B" elsif n >= 1_000_000 (n.to_f / 1_000_000.0).round(2).to_s + "M" elsif n >= 1_000 (n.to_f / 1_000.0).round(1).to_s + "K" else n.to_s end end |
.gelu_new(x) ⇒ Object
GeLU (tanh approximation — HF’s gelu_new). Returns a fresh Mat. See lib/transformer.rb for GELU_C / GELU_K / GELU_DK constants used here.
867 868 869 870 871 872 873 874 875 876 877 878 |
# File 'lib/toy.rb', line 867 def self.gelu_new(x) n = x.nrows * x.ncols out = Mat.new(x.nrows, x.ncols) i = 0 while i < n v = x.flat[i] u = GELU_C * (v + GELU_K * v * v * v) out.flat[i] = 0.5 * v * (1.0 + Math.tanh(u)) i += 1 end out end |
.hadamard!(dst, src) ⇒ Object
Elementwise multiply, into ‘dst` (dst := dst * src). Both have identical shape. Param names avoid `a` / `b` to dodge a Spinel collapse with TinyNN.matmul(a, b) and friends.
792 793 794 795 796 797 798 799 |
# File 'lib/toy.rb', line 792 def self.hadamard!(dst, src) n = dst.nrows * dst.ncols i = 0 while i < n dst.flat[i] = dst.flat[i] * src.flat[i] i += 1 end end |
.hstack_heads(per_head, n_heads, d_head, d_model) ⇒ Object
Pack n_heads × [T, Dh] back into a single [T, D] matrix where head h occupies columns [h*Dh, (h+1)*Dh).
882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 |
# File 'lib/toy.rb', line 882 def self.hstack_heads(per_head, n_heads, d_head, d_model) t = per_head[0].nrows out = Mat.new(t, d_model) h = 0 while h < n_heads head = per_head[h] base = h * d_head i = 0 while i < t j = 0 while j < d_head out.flat[i * d_model + (base + j)] = head.flat[i * d_head + j] j += 1 end i += 1 end h += 1 end out end |
.silu!(m) ⇒ Object
SiLU activation, in-place. silu(x) = x / (1 + exp(-x)).
743 744 745 746 747 748 749 750 751 |
# File 'lib/toy.rb', line 743 def self.silu!(m) n = m.nrows * m.ncols i = 0 while i < n v = m.flat[i] m.flat[i] = v / (1.0 + Math.exp(-v)) i += 1 end end |
.softmax_rows!(m) ⇒ Object
Row-wise softmax, in-place, numerically stable (max-shift).
832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 |
# File 'lib/toy.rb', line 832 def self.softmax_rows!(m) t = m.nrows n = m.ncols i = 0 while i < t base = i * n mx = m.flat[base] j = 1 while j < n v = m.flat[base + j] if v > mx mx = v end j += 1 end sum = 0.0 j = 0 while j < n e = Math.exp(m.flat[base + j] - mx) m.flat[base + j] = e sum = sum + e j += 1 end j = 0 while j < n m.flat[base + j] = m.flat[base + j] / sum j += 1 end i += 1 end end |
.tap(label, x) ⇒ Object
Print a labelled shape line and return the Mat unchanged. Useful to drop in the middle of a forward pass:
x = Toy.tap("after attn", @attn.forward(@ln1.forward(x)))
(Implemented with separate puts/print calls to dodge a Spinel quirk where chained String + Mat#shape concat fails to compile.)
760 761 762 763 764 765 |
# File 'lib/toy.rb', line 760 def self.tap(label, x) print label print ": Mat" puts x.shape x end |
.tap_info(label, x) ⇒ Object
Like ‘tap` but with min/max/mean stats — for “is this drifting?” sanity checks during inference.
769 770 771 772 773 774 |
# File 'lib/toy.rb', line 769 def self.tap_info(label, x) print label print ": " puts x.info x end |