Top Level Namespace

Defined Under Namespace

Modules: GGUFLoad, GPT2BPE, GPT2ConfigLoader, GPT2FFI, GPT2FFICuda, GPT2FFIMetal, GPT2KV, GPT2KVCuda, GPT2KVMetal, GgufKV, Sampler, SmolLM2ConfigLoader, SmolLM2KV, SmolLM2KVCuda, SmolLM2KVMetal, TinyNN, TinyNNCuda, TinyNNMetal, Toy, ToyChatTemplate, ToyCorpusLoader, ToyDescribeFlow, ToyDriftGrad, ToyGGUFFuser, ToyGGUFWriter, ToyImageLoader, ToyLR, ToyLogProbs, ToySample, ToyTap, ToyTokenDrift, ToyVit Classes: Adam, AdamState, AdamStepResult, Arch, AttnCache, AttnResult, Block, BlockFFICache, BlockFFICacheCuda, BlockResult, ChatCompletionsHandler, CompletionsHandler, DataLoader, EmbeddingsHandler, FFCache, FFNFFICache, FFNFFICacheCuda, FFNFFICacheMetal, FFResult, ForwardCache, FullForwardFFICache, FullForwardFFICacheCuda, GPT2BPETables, GPT2Block, GPT2BlockFFI, GPT2BlockFFICuda, GPT2BlockFFIMetal, GPT2Config, GPT2FullForwardFFICache, GPT2FullForwardFFICacheCuda, GPT2FullForwardFFICacheMetal, GPT2KVBlockFFI, GPT2KVBlockFFICuda, GPT2KVBlockFFIMetal, GPT2KVFFICache, GPT2KVFFICacheCuda, GPT2KVFFICacheMetal, GPT2KVStepResult, GPT2KVStepResultCuda, GPT2KVStepResultMetal, GPT2LM, Gradients, HeadCache, HealthHandler, IndexHandler, LRSchedule, LayerCache, LossResult, Mat, ModelEntry, ModelIndex, ModelsHandler, Module, NormResult, SamplerConfig, SamplerContext, SmolLM2KVBlockFFI, SmolLM2KVBlockFFICuda, SmolLM2KVBlockFFIMetal, SmolLM2KVFFICache, SmolLM2KVFFICacheCuda, SmolLM2KVFFICacheMetal, SmolLM2KVStepResult, SmolLM2KVStepResultCuda, SmolLM2KVStepResultMetal, State, Tokenizer, ToyLM, ToyLMCuda, ToyLMMetal, TransformerLM, ViTTinyConfig

Constant Summary collapse

GGUF =

ENV["GGUF"] || "data/smollm2-135m-native.gguf"

TOP_K =

(ENV["TOP_K"] || "5").to_i

PROMPT =

ENV["PROMPT"] || "Once upon a time"

N_NEW =

(ENV["N_NEW"] || "16").to_i

PROMPT_IDS = models (parity with the CPU runner). Empty when unset.

ENV["PROMPT_IDS"] || "261"

TAO_RUN_DIR = —- Events sink (toy/v1 serving telemetry; FILE only). ——————- TOP-LEVEL constants (NEVER inside a branch — Spinel does not initialize a top-level CONSTANT assigned inside a conditional arm at runtime; it reads back empty, silently skipping all event writes; landmine, train.rb:82-85). TAO_RUN_DIR is set by ‘toy serve` (lib/toy/core/cli/serve.rb) when it has resolved a run id + created runs/<id>/. When empty, serving is events-OFF (cheap-when-off: every emit guard short-circuits), exactly like train.

ENV["TAO_RUN_DIR"] || ""

RUN_ID =

ENV["TOY_RUN_ID"] || ""

EVENTS = — Events (FILE only when TAO_RUN_DIR set). —

TAO_RUN_DIR.length > 0 ? (TAO_RUN_DIR + "/events.jsonl") : ""

SERVE_PORT = PORT hoisted ABOVE the run_start emit so config.port is available. No side effects, so the hoist is safe (the Tep.run! call below still binds it).

(ENV["PORT"] || "4567").to_i

RECIPE = NOTE: this CUDA runner hosts the two RANDOM-INIT recipes (from-scratch + warm-start, both Toy::LLM::Engine::LlamaSeqEngineCuda#realize_for_random_init), selected by RECIPE. The LoRA recipe lives in a SEPARATE binary (lib/toy/run/train_lora_cuda.rb -> libexec/toy-train-lora-cuda): its #realize_for_mmap path cannot share a Spinel compilation unit with the random-init path without a cfg type-merge miscompile (landmine #16, same rationale as the CPU split train_lora.rb vs train.rb).

ENV["RECIPE"] || "from-scratch"

STEPS = ENV reads — TOP-LEVEL constants (Spinel constant-in-conditional caveat).

(ENV["STEPS"]    || "5").to_i

SEED =

(ENV["SEED"]     || "0").to_i

VOCAB = From-scratch gate shape (literal, matches the CPU runner so the curve compares).

D_MODEL =

DONOR_D =

N_HEADS =

D_FF =

N_LAYERS =

CONTEXT =

LMC_A =

ENV["LMC_A"]      || ""

LMC_B =

ENV["LMC_B"]      || ""

ALPHAS_S =

ENV["LMC_ALPHAS"] || "0,0.25,0.5,0.75,1.0"

SEQ_LEN =

(ENV["CONTEXT"] || "32").to_i

RUN_DIR =

ENV["TAO_RUN_DIR"] || ""

D_HEAD = D_HEAD = d_model / n_heads. Per-head fused-tensor slice geometry.

d_model / n_heads

IMG_DIR =

(ENV["IMG_DIR"]    || "data/vit_smoke")

TOY_RUN_ID =

(ENV["TOY_RUN_ID"]  || "vit-tiny")

IMAGE_SIZE = Gate-fixed timm ViT-Tiny SHAPE — hardcoded (NOT env/flags). This is the shape data/vit_smoke matches (224/16/196/10); the 16x16 ENV defaults in 07_train_vit_tiny.rb’s header are the REJECTED synthetic shape.

PATCH_SIZE =

NUM_CHAN =

NUM_CLASSES =

LN_EPS =

1.0e-5

N_IMAGES =

LR_MAX = LR schedule PINNED to 07’s defaults (confirmed produce the recorded baseline curve). With WARMUP=10 > STEPS=5, every step is on the linear warmup ramp.

0.003

LR_MIN =

0.0001

WARMUP =

LR =

(ENV["LR"]       || "0.001").to_f

RANK_LORA =

(ENV["RANK"] || "8").to_i

TARGET_ID =

TOKENS =

[12092, 4845, 253, 1429]

USE_FFI_MATMUL = The FFN’s two matmuls go through TinyNN (ggml-CPU FFI) when this is true. Off by default to keep the toy zero-dep; flip on to use the bridge and accelerate at real-LLM scale (see tinynn/README.md).

false

GELU_C = GeLU tanh-approximation constants (the GPT-2 formula; identical to torch.nn.functional.gelu(…, approximate=‘tanh’). Defined here so the forward, backward, and any per-tensor variant agree byte-for-byte. gelu(x) = 0.5 * x * (1 + tanh( GELU_C * (x + GELU_K * x^3) ))

0.7978845608028654

GELU_K = sqrt(2/π)

0.044715

GELU_DK = cubic coefficient

0.134145

RMS_EPS_DEFAULT = RMSNorm / LayerNorm default epsilon. Matches Llama / SmolLM2 / GPT-2 conventions. Individual instances can override via their own @eps ivar; this is the fallback used by the row-level helpers.

1.0e-5

LOG_PROB_FLOOR = Numerical floor for probabilities going into log() in cross_entropy. 1e-12 is safely above the F32 / F64 underflow threshold and matches PyTorch’s “label smoothing” default clip.

1.0e-12

NEG_INF_SCORE = Causal-mask sentinel: attention scores set to this become ~0 after softmax (Math.exp(-1e30) underflows cleanly to 0.0). Avoid -Infinity because (Float::INFINITY - Float::INFINITY) is NaN if downstream code rescales or subtracts max.

-1.0e30
# tinynn is always required so FFNFFICache is defined (it lives in
# lib/toy/ffi/tinynn.rb). The require itself doesn't run any FFI code; only
# feed_forward_ffi's USE_FFI_MATMUL-gated branch does. With
# USE_FFI_MATMUL=false the FFI methods are dead code that Spinel's
# DCE drops, but the library libs still get linked.

MAT_SHAPES_ON = One-shot diagnostic: when MAT_SHAPES_ON=true, every matmul prints its shape triple on stdout (post-process with sort | uniq -c). Off by default; the const must exist for Spinel to resolve the read.

(ENV["MAT_SHAPES"] || "") == "1"

GGUF_PATH = GH#188 – model selection via env. Defaults to SmolLM2-135M for a cheap-to-test smoke; override for any llama-family GGUF. If MODEL_NAME isn’t set, it defaults to the basename of MODEL_PATH minus the “.gguf” suffix – close enough for the /v1/models response.

ENV["MODEL_PATH"] || "data/smollm2-135m-native.gguf"

MODEL_NAME_ENV =

ENV["MODEL_NAME"] || ""

MODEL_NAME =

MODEL_NAME_ENV.length > 0 ? MODEL_NAME_ENV : _mn_default

MAX_T =

(ENV["MAX_T"] || "256").to_i

STATE =

State.new

Instance Method Summary collapse

#all_digits?(s) ⇒ Boolean

all_digits? — true iff ‘s` is non-empty and every char is 0..9 (explicit char scan, not exception-based Integer(s), per the Spinel landmines).
#api_gen_id(prefix) ⇒ Object
#api_generate_ids(prompt_ids, n_new) ⇒ Object

Greedy generation from a pre-tokenized prompt.
#api_now_unix ⇒ Object

—- Helpers —-.
#fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) ⇒ Object

Look up (or read+cache) the full fused tensor for ‘fused_name` from ckpt A.
#parse_ids(line) ⇒ Object
#read_prompt(path) ⇒ Object
#read_sequences(path) ⇒ Object
#read_vocab(path) ⇒ Object

————————————————————————— Corpus readers — load vocab, sequences, and prompt from data/ts_*.txt —————————————————————————.

Instance Method Details

#all_digits?(s) ⇒ `Boolean`

all_digits? — true iff ‘s` is non-empty and every char is 0..9 (explicit char scan, not exception-based Integer(s), per the Spinel landmines).

Returns:

(Boolean)

# File 'lib/toy/run/infer.rb', line 67

def all_digits?(s)
  return false if s.length == 0
  i = 0
  while i < s.length
    c = s[i]
    if c < "0" || c > "9"
      return false
    end
    i = i + 1
  end
  true
end

#api_gen_id(prefix) ⇒ `Object`

# File 'lib/toy/serve/openai/server.rb', line 105

def api_gen_id(prefix)
  t = Time.now
  v = (t.to_i * 1_000_003) ^ ((t.to_f - t.to_i).to_f * 1.0e9).to_i
  prefix + "-" + v.to_s
end

#api_generate_ids(prompt_ids, n_new) ⇒ `Object`

Greedy generation from a pre-tokenized prompt. KV-cache decode: prefill the prompt one step at a time, then sample greedily for ‘n_new` more steps. Returns Array<Int> of the new token IDs (does NOT include the prompt).

This routine re-runs the prefill from position 0 every call; the cache’s t_K / t_V tensors are persistent and get overwritten in place. (A future optimisation would be a fast prefix-cache for shared prompts.)

# File 'lib/toy/serve/openai/server.rb', line 120

def api_generate_ids(prompt_ids, n_new)
  out_ids = [0]
  out_ids.pop

  vocab = STATE.cfg.vocab
  last_logits = Mat.new(1, vocab)
  prefill_pos = 0
  while prefill_pos < prompt_ids.length
    last_logits = SmolLM2KV.decode_step(STATE.kv, prompt_ids[prefill_pos], prefill_pos)
    prefill_pos = prefill_pos + 1
  end

  # First generated token comes from the last prefill step's logits.
  best_idx = 0
  best_val = last_logits.flat[0]
  v_iter = 1
  while v_iter < vocab
    val = last_logits.flat[v_iter]
    if val > best_val; best_val = val; best_idx = v_iter; end
    v_iter = v_iter + 1
  end
  out_ids.push(best_idx)

  step = 1
  while step < n_new
    last_logits = SmolLM2KV.decode_step(STATE.kv, out_ids[out_ids.length - 1],
                                         prompt_ids.length + out_ids.length - 1)
    best_idx = 0
    best_val = last_logits.flat[0]
    v_iter = 1
    while v_iter < vocab
      val = last_logits.flat[v_iter]
      if val > best_val; best_val = val; best_idx = v_iter; end
      v_iter = v_iter + 1
    end
    out_ids.push(best_idx)
    step = step + 1
  end
  out_ids
end

#api_now_unix ⇒ `Object`

—- Helpers —-



101
102
103

# File 'lib/toy/serve/openai/server.rb', line 101

def api_now_unix
  Time.now.to_i
end

#fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) ⇒ `Object`

Look up (or read+cache) the full fused tensor for ‘fused_name` from ckpt A.

# File 'lib/toy/run/eval_lmc.rb', line 161

def fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full)
  fi = 0
  while fi < fused_names.length
    if fused_names[fi] == fused_name
      return fi
    end
    fi = fi + 1
  end
  # Miss — read both A and B once.
  idx_a = TinyNN.tnn_gguf_find_index(ggA, fused_name)
  idx_b = TinyNN.tnn_gguf_find_index(ggB, fused_name)
  if idx_a < 0 || idx_b < 0
    puts "toy-eval-lmc: missing fused " + fused_name +
                " in A=" + idx_a.to_s + " B=" + idx_b.to_s
    return -1
  end
  ma = Mat.new(1, nel_full)
  mb = Mat.new(1, nel_full)
  TinyNN.tnn_gguf_read_f32_to_doubles(ggA, idx_a, ma.flat, nel_full)
  TinyNN.tnn_gguf_read_f32_to_doubles(ggB, idx_b, mb.flat, nel_full)
  fused_names.push(fused_name)
  fused_a.push(ma)
  fused_b.push(mb)
  return fused_names.length - 1
end

#parse_ids(line) ⇒ `Object`

# File 'lib/toy/train/training.rb', line 19

def parse_ids(line)
  parts = line.split(" ")
  ids   = [parts[0].to_i]
  k = 1
  while k < parts.length
    ids.push(parts[k].to_i)
    k += 1
  end
  ids
end

#read_prompt(path) ⇒ `Object`

# File 'lib/toy/train/training.rb', line 45

def read_prompt(path)
  raw = ["?"]
  raw.pop
  File.open(path, "r") do |f|
    f.each_line { |line| raw.push(line.chomp) }
  end
  parse_ids(raw[0])
end

#read_sequences(path) ⇒ `Object`

# File 'lib/toy/train/training.rb', line 30

def read_sequences(path)
  raw = ["?"]
  raw.pop
  File.open(path, "r") do |f|
    f.each_line { |line| raw.push(line.chomp) }
  end
  seqs = [parse_ids(raw[0])]
  i = 1
  while i < raw.length
    seqs.push(parse_ids(raw[i]))
    i += 1
  end
  seqs
end

#read_vocab(path) ⇒ `Object`

Corpus readers — load vocab, sequences, and prompt from data/ts_*.txt

# File 'lib/toy/train/training.rb', line 10

def read_vocab(path)
  vocab = ["?"]
  vocab.pop
  File.open(path, "r") do |f|
    f.each_line { |line| vocab.push(line.chomp) }
  end
  vocab
end

Top Level Namespace

Defined Under Namespace

Constant Summary collapse

Instance Method Summary collapse

Instance Method Details

#all_digits?(s) ⇒ Boolean

#api_gen_id(prefix) ⇒ Object

#api_generate_ids(prompt_ids, n_new) ⇒ Object

#api_now_unix ⇒ Object

#fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) ⇒ Object

#parse_ids(line) ⇒ Object

#read_prompt(path) ⇒ Object

#read_sequences(path) ⇒ Object

#read_vocab(path) ⇒ Object