Module: Toy

Defined in:: lib/toy/mri.rb,
lib/toy.rb,
lib/toy/compute.rb,
lib/toy/version.rb,
lib/toy/core/cli.rb,
lib/toy/llm/adamw.rb,
lib/toy/llm/labels.rb,
lib/toy/core/config.rb,
lib/toy/compute_cuda.rb,
lib/toy/core/cli/new.rb,
lib/toy/core/run_log.rb,
lib/toy/dev/toy_card.rb,
lib/toy/ffi_manifest.rb,
lib/toy/compute_metal.rb,
lib/toy/core/cli/eval.rb,
lib/toy/core/cli/list.rb,
lib/toy/core/toy_root.rb,
lib/toy/io/run_bundle.rb,
lib/toy/io/toy_events.rb,
lib/toy/core/cli/fetch.rb,
lib/toy/core/cli/infer.rb,
lib/toy/core/cli/serve.rb,
lib/toy/core/cli/train.rb,
lib/toy/core/gguf_meta.rb,
lib/toy/core/model_scan.rb,
lib/toy/models/toy_gpt2.rb,
lib/toy/core/cli/install.rb,
lib/toy/llm/recipes/lora.rb,
lib/toy/core/cli/describe.rb,
lib/toy/core/cli/manifest.rb,
lib/toy/train/toy_trainer.rb,
lib/toy/llm/classify_batch.rb,
lib/toy/llm/primitives/gqa.rb,
lib/toy/llm/recipe_options.rb,
lib/toy/llm/training_batch.rb,
lib/toy/models/toy_smollm2.rb,
lib/toy/core/cli/exit_codes.rb,
lib/toy/llm/primitives/rope.rb,
lib/toy/llm/archs/llama_arch.rb,
lib/toy/llm/recipes/vit_tiny.rb,
lib/toy/llm/primitives/swiglu.rb,
lib/toy/llm/recipes/lora_cuda.rb,
lib/toy/llm/recipes/lora_metal.rb,
lib/toy/llm/recipes/warm_start.rb,
lib/toy/llm/primitives/gqa_cuda.rb,
lib/toy/llm/primitives/rms_norm.rb,
lib/toy/llm/primitives/gqa_metal.rb,
lib/toy/llm/primitives/rope_cuda.rb,
lib/toy/llm/recipes/from_scratch.rb,
lib/toy/llm/archs/llama_arch_cuda.rb,
lib/toy/llm/primitives/rope_metal.rb,
lib/toy/llm/archs/llama_arch_metal.rb,
lib/toy/llm/engine/gpt2_seq_engine.rb,
lib/toy/llm/engine/vit_tiny_engine.rb,
lib/toy/llm/primitives/swiglu_cuda.rb,
lib/toy/llm/engine/llama_seq_engine.rb,
lib/toy/llm/primitives/swiglu_metal.rb,
lib/toy/llm/recipes/warm_start_cuda.rb,
lib/toy/llm/blocks/transformer_block.rb,
lib/toy/llm/primitives/rms_norm_cuda.rb,
lib/toy/llm/recipes/warm_start_metal.rb,
lib/toy/llm/primitives/rms_norm_metal.rb,
lib/toy/llm/recipes/from_scratch_cuda.rb,
lib/toy/llm/recipes/from_scratch_metal.rb,
lib/toy/llm/engine/gpt2_seq_engine_cuda.rb,
lib/toy/llm/engine/gpt2_seq_engine_metal.rb,
lib/toy/llm/engine/llama_seq_engine_cuda.rb,
lib/toy/llm/blocks/transformer_block_cuda.rb,
lib/toy/llm/engine/llama_seq_engine_metal.rb,
lib/toy/llm/blocks/transformer_block_metal.rb

Overview

lib/toy/llm/blocks/transformer_block_metal.rb — Metal mirror of lib/toy/llm/blocks/transformer_block.rb.

AUTO-GENERATED by prep/gen_cuda_mirror.rb. Do not edit by hand; edit the CPU source and re-run the generator. Same L2 contract on the GPU backend via TinyNNMetal. (RMSNorm + GQA attention with RoPE + SwiGLU FFN), seq-mode forward.

Extracted from lib/llama_seq_forward_ffi.rb (P2.4). This is the minimal faithful lift of the former LlamaSeqBlockFFI class + build_seq_block / build_seq_qhead / mp_matmul: the forward body is moved VERBATIM (op order unchanged → bit-identical output) with only mechanical rewrites — the per-forward context the body previously read off the cache as @ivars is now passed IN via a positional ctx object (TransformerBlockCtx), and the block owns its own weight handles (former blk.* are now self.*).

DIVERGENCE from the L2 README sketch: the README shows a forward- looking build_forward(sess, x, state, cfg) -> [out, state_out] with a KV-cache “state” threaded per block. That incremental KV-decode does NOT exist in seq mode — full-sequence forward threads NO per-block KV state (KV-cache decode is the separate lib/toy_smollm2_ffi_kv.rb path, out of scope here). We adapt to build_forward(sess, t_x, ctx) -> t_resid (single handle). We also keep the owned-weight field names (t_seq_*) verbatim rather than renaming to the sketch’s short names, so the cache-side realize / train / tap walkers keep working by accessor name with no parity risk.

Spinel hygiene: TransformerBlockCtx is a plain class with an explicit positional initialize (NO kwargs, NO default args — default-arg poisoning, landmine #4); TransformerBlock#initialize takes NO args and has NO default-arg ctor; no Card / step_bind / FFI :str args at class load (step_bind :str landmine 2026-05-28 — ft_name_last’s tnn_tensor_set_name :str stays on the cache realize runtime path, not here). The IntArray ptr-array params (t_k_per_kv, t_vt_per_kv) keep their trailing positional slots so Spinel’s locked IntArray param typing (#688) does not shift.

This file does NOT ‘require_relative “tinynn”`: the loading module (lib/llama_seq_forward_ffi.rb) already loads the correct backend’s TinyNNMetal before requiring this block, exactly like the L1 primitives. The mirror generator picks the backend via the monolith’s require rewrite.

Defined Under Namespace

Modules: Core, Device, Events, FFIManifest, LLM, Labels, MRI Classes: AdamW, Card, CardHyper, CardItem, CausalSelfAttention, Embedding, FFN, GPT2, GPT2Block, GPT2Config, GQAttention, LayerNorm, Linear, RMSNorm, RoPE, RopeScaling, RunBundle, RunLog, SmolLM2, SmolLM2Block, SmolLM2Config, Step, SwiGLU, Trainer

Constant Summary collapse

VERSION = Single source of truth: gemspec + ‘toy –version` + `toy –manifest` all read this; README/CHANGELOG/git tag display it as v0.8.0. v0.8.0 (2026-06-12) is the first PUBLISHED version (RubyGems). Pre-1.0: not API-stable.

"0.8.0".freeze

Class Method Summary collapse

.add_bias!(x, b) ⇒ Object

x[i, j] += b, in-place.
.causal_mask!(scores) ⇒ Object

scores[i, j] = -inf for j > i.
.fmt_count(n) ⇒ Object

Pretty-format a parameter count: 49,152 → “49.2K”; 1_233_000 → “1.2M”.
.gelu_new(x) ⇒ Object

GeLU (tanh approximation — HF’s gelu_new).
.hadamard!(dst, src) ⇒ Object

Elementwise multiply, into ‘dst` (dst := dst * src).
.hstack_heads(per_head, n_heads, d_head, d_model) ⇒ Object

Pack n_heads × [T, Dh] back into a single [T, D] matrix where head h occupies columns [h*Dh, (h+1)*Dh).
.silu!(m) ⇒ Object

SiLU activation, in-place.
.softmax_rows!(m) ⇒ Object

Row-wise softmax, in-place, numerically stable (max-shift).
.tap(label, x) ⇒ Object

Print a labelled shape line and return the Mat unchanged.
.tap_info(label, x) ⇒ Object

Like ‘tap` but with min/max/mean stats — for “is this drifting?” sanity checks during inference.

Class Method Details

.add_bias!(x, b) ⇒ `Object`

x[i, j] += b, in-place.

# File 'lib/toy.rb', line 802

def self.add_bias!(x, b)
  t = x.nrows
  d = x.ncols
  i = 0
  while i < t
    j = 0
    while j < d
      x.flat[i * d + j] = x.flat[i * d + j] + b[j]
      j += 1
    end
    i += 1
  end
end

.causal_mask!(scores) ⇒ `Object`

scores[i, j] = -inf for j > i. In-place.

# File 'lib/toy.rb', line 817

def self.causal_mask!(scores)
  t = scores.nrows
  n = scores.ncols
  i = 0
  while i < t
    j = i + 1
    while j < n
      scores.flat[i * n + j] = NEG_INF_SCORE
      j += 1
    end
    i += 1
  end
end

.fmt_count(n) ⇒ `Object`

Pretty-format a parameter count: 49,152 → “49.2K”; 1_233_000 → “1.2M”.

# File 'lib/toy.rb', line 777

def self.fmt_count(n)
  if n >= 1_000_000_000
    (n.to_f / 1_000_000_000.0).round(2).to_s + "B"
  elsif n >= 1_000_000
    (n.to_f / 1_000_000.0).round(2).to_s + "M"
  elsif n >= 1_000
    (n.to_f / 1_000.0).round(1).to_s + "K"
  else
    n.to_s
  end
end

.gelu_new(x) ⇒ `Object`

GeLU (tanh approximation — HF’s gelu_new). Returns a fresh Mat. See lib/transformer.rb for GELU_C / GELU_K / GELU_DK constants used here.

# File 'lib/toy.rb', line 867

def self.gelu_new(x)
  n   = x.nrows * x.ncols
  out = Mat.new(x.nrows, x.ncols)
  i = 0
  while i < n
    v = x.flat[i]
    u = GELU_C * (v + GELU_K * v * v * v)
    out.flat[i] = 0.5 * v * (1.0 + Math.tanh(u))
    i += 1
  end
  out
end

.hadamard!(dst, src) ⇒ `Object`

Elementwise multiply, into ‘dst` (dst := dst * src). Both have identical shape. Param names avoid `a` / `b` to dodge a Spinel collapse with TinyNN.matmul(a, b) and friends.

# File 'lib/toy.rb', line 792

def self.hadamard!(dst, src)
  n = dst.nrows * dst.ncols
  i = 0
  while i < n
    dst.flat[i] = dst.flat[i] * src.flat[i]
    i += 1
  end
end

.hstack_heads(per_head, n_heads, d_head, d_model) ⇒ `Object`

Pack n_heads × [T, Dh] back into a single [T, D] matrix where head h occupies columns [h*Dh, (h+1)*Dh).

# File 'lib/toy.rb', line 882

def self.hstack_heads(per_head, n_heads, d_head, d_model)
  t   = per_head[0].nrows
  out = Mat.new(t, d_model)
  h = 0
  while h < n_heads
    head = per_head[h]
    base = h * d_head
    i = 0
    while i < t
      j = 0
      while j < d_head
        out.flat[i * d_model + (base + j)] = head.flat[i * d_head + j]
        j += 1
      end
      i += 1
    end
    h += 1
  end
  out
end

.silu!(m) ⇒ `Object`

SiLU activation, in-place. silu(x) = x / (1 + exp(-x)).

# File 'lib/toy.rb', line 743

def self.silu!(m)
  n = m.nrows * m.ncols
  i = 0
  while i < n
    v = m.flat[i]
    m.flat[i] = v / (1.0 + Math.exp(-v))
    i += 1
  end
end

.softmax_rows!(m) ⇒ `Object`

Row-wise softmax, in-place, numerically stable (max-shift).

# File 'lib/toy.rb', line 832

def self.softmax_rows!(m)
  t = m.nrows
  n = m.ncols
  i = 0
  while i < t
    base = i * n
    mx = m.flat[base]
    j = 1
    while j < n
      v = m.flat[base + j]
      if v > mx
        mx = v
      end
      j += 1
    end
    sum = 0.0
    j = 0
    while j < n
      e = Math.exp(m.flat[base + j] - mx)
      m.flat[base + j] = e
      sum = sum + e
      j += 1
    end
    j = 0
    while j < n
      m.flat[base + j] = m.flat[base + j] / sum
      j += 1
    end
    i += 1
  end
end

.tap(label, x) ⇒ `Object`

Print a labelled shape line and return the Mat unchanged. Useful to drop in the middle of a forward pass:

x = Toy.tap("after attn", @attn.forward(@ln1.forward(x)))

(Implemented with separate puts/print calls to dodge a Spinel quirk where chained String + Mat#shape concat fails to compile.)

# File 'lib/toy.rb', line 760

def self.tap(label, x)
  print label
  print ": Mat"
  puts x.shape
  x
end

.tap_info(label, x) ⇒ `Object`

Like ‘tap` but with min/max/mean stats — for “is this drifting?” sanity checks during inference.

# File 'lib/toy.rb', line 769

def self.tap_info(label, x)
  print label
  print ": "
  puts x.info
  x
end

Module: Toy

Overview

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.add_bias!(x, b) ⇒ Object

.causal_mask!(scores) ⇒ Object

.fmt_count(n) ⇒ Object

.gelu_new(x) ⇒ Object

.hadamard!(dst, src) ⇒ Object

.hstack_heads(per_head, n_heads, d_head, d_model) ⇒ Object

.silu!(m) ⇒ Object

.softmax_rows!(m) ⇒ Object

.tap(label, x) ⇒ Object

.tap_info(label, x) ⇒ Object