Top Level Namespace

Defined Under Namespace

Modules: GGUFLoad, GPT2BPE, GPT2ConfigLoader, GPT2FFI, GPT2FFICuda, GPT2FFIMetal, GPT2KV, GPT2KVCuda, GPT2KVMetal, GgufKV, Sampler, SmolLM2ConfigLoader, SmolLM2KV, SmolLM2KVCuda, SmolLM2KVMetal, TinyNN, TinyNNCuda, TinyNNMetal, Toy, ToyChatTemplate, ToyCorpusLoader, ToyDescribeFlow, ToyDriftGrad, ToyGGUFFuser, ToyGGUFWriter, ToyImageLoader, ToyLR, ToyLogProbs, ToySample, ToyTap, ToyTokenDrift, ToyVit Classes: Adam, AdamState, AdamStepResult, Arch, AttnCache, AttnResult, Block, BlockFFICache, BlockFFICacheCuda, BlockResult, ChatCompletionsHandler, CompletionsHandler, DataLoader, EmbeddingsHandler, FFCache, FFNFFICache, FFNFFICacheCuda, FFNFFICacheMetal, FFResult, ForwardCache, FullForwardFFICache, FullForwardFFICacheCuda, GPT2BPETables, GPT2Block, GPT2BlockFFI, GPT2BlockFFICuda, GPT2BlockFFIMetal, GPT2Config, GPT2FullForwardFFICache, GPT2FullForwardFFICacheCuda, GPT2FullForwardFFICacheMetal, GPT2KVBlockFFI, GPT2KVBlockFFICuda, GPT2KVBlockFFIMetal, GPT2KVFFICache, GPT2KVFFICacheCuda, GPT2KVFFICacheMetal, GPT2KVStepResult, GPT2KVStepResultCuda, GPT2KVStepResultMetal, GPT2LM, Gradients, HeadCache, HealthHandler, IndexHandler, LRSchedule, LayerCache, LossResult, Mat, ModelEntry, ModelIndex, ModelsHandler, Module, NormResult, SamplerConfig, SamplerContext, SmolLM2KVBlockFFI, SmolLM2KVBlockFFICuda, SmolLM2KVBlockFFIMetal, SmolLM2KVFFICache, SmolLM2KVFFICacheCuda, SmolLM2KVFFICacheMetal, SmolLM2KVStepResult, SmolLM2KVStepResultCuda, SmolLM2KVStepResultMetal, State, Tokenizer, ToyLM, ToyLMCuda, ToyLMMetal, TransformerLM, ViTTinyConfig

Constant Summary collapse

GGUF =
ENV["GGUF"] || "data/smollm2-135m-native.gguf"
TOP_K =
(ENV["TOP_K"] || "5").to_i
PROMPT =
ENV["PROMPT"] || "Once upon a time"
N_NEW =
(ENV["N_NEW"] || "16").to_i
PROMPT_IDS =

models (parity with the CPU runner). Empty when unset.

ENV["PROMPT_IDS"] || "261"
TAO_RUN_DIR =

—- Events sink (toy/v1 serving telemetry; FILE only). ——————- TOP-LEVEL constants (NEVER inside a branch — Spinel does not initialize a top-level CONSTANT assigned inside a conditional arm at runtime; it reads back empty, silently skipping all event writes; landmine, train.rb:82-85). TAO_RUN_DIR is set by ‘toy serve` (lib/toy/core/cli/serve.rb) when it has resolved a run id + created runs/<id>/. When empty, serving is events-OFF (cheap-when-off: every emit guard short-circuits), exactly like train.

ENV["TAO_RUN_DIR"] || ""
RUN_ID =
ENV["TOY_RUN_ID"] || ""
EVENTS =

— Events (FILE only when TAO_RUN_DIR set). —

TAO_RUN_DIR.length > 0 ? (TAO_RUN_DIR + "/events.jsonl") : ""
SERVE_PORT =

PORT hoisted ABOVE the run_start emit so config.port is available. No side effects, so the hoist is safe (the Tep.run! call below still binds it).

(ENV["PORT"] || "4567").to_i
RECIPE =

NOTE: this CUDA runner hosts the two RANDOM-INIT recipes (from-scratch + warm-start, both Toy::LLM::Engine::LlamaSeqEngineCuda#realize_for_random_init), selected by RECIPE. The LoRA recipe lives in a SEPARATE binary (lib/toy/run/train_lora_cuda.rb -> libexec/toy-train-lora-cuda): its #realize_for_mmap path cannot share a Spinel compilation unit with the random-init path without a cfg type-merge miscompile (landmine #16, same rationale as the CPU split train_lora.rb vs train.rb).

ENV["RECIPE"] || "from-scratch"
STEPS =

ENV reads — TOP-LEVEL constants (Spinel constant-in-conditional caveat).

(ENV["STEPS"]    || "5").to_i
SEED =
(ENV["SEED"]     || "0").to_i
VOCAB =

From-scratch gate shape (literal, matches the CPU runner so the curve compares).

627
D_MODEL =
64
DONOR_D =
128
N_HEADS =
4
D_FF =
128
N_LAYERS =
2
CONTEXT =
32
LMC_A =
ENV["LMC_A"]      || ""
LMC_B =
ENV["LMC_B"]      || ""
ALPHAS_S =
ENV["LMC_ALPHAS"] || "0,0.25,0.5,0.75,1.0"
SEQ_LEN =
(ENV["CONTEXT"] || "32").to_i
RUN_DIR =
ENV["TAO_RUN_DIR"] || ""
D_HEAD =

D_HEAD = d_model / n_heads. Per-head fused-tensor slice geometry.

d_model / n_heads
IMG_DIR =
(ENV["IMG_DIR"]    || "data/vit_smoke")
TOY_RUN_ID =
(ENV["TOY_RUN_ID"]  || "vit-tiny")
IMAGE_SIZE =

Gate-fixed timm ViT-Tiny SHAPE — hardcoded (NOT env/flags). This is the shape data/vit_smoke matches (224/16/196/10); the 16x16 ENV defaults in 07_train_vit_tiny.rb’s header are the REJECTED synthetic shape.

224
PATCH_SIZE =
16
NUM_CHAN =
3
NUM_CLASSES =
10
LN_EPS =
1.0e-5
N_IMAGES =
1
LR_MAX =

LR schedule PINNED to 07’s defaults (confirmed produce the recorded baseline curve). With WARMUP=10 > STEPS=5, every step is on the linear warmup ramp.

0.003
LR_MIN =
0.0001
WARMUP =
10
LR =
(ENV["LR"]       || "0.001").to_f
RANK_LORA =
(ENV["RANK"] || "8").to_i
TARGET_ID =
99
TOKENS =
[12092, 4845, 253, 1429]
USE_FFI_MATMUL =

The FFN’s two matmuls go through TinyNN (ggml-CPU FFI) when this is true. Off by default to keep the toy zero-dep; flip on to use the bridge and accelerate at real-LLM scale (see tinynn/README.md).

false
GELU_C =

GeLU tanh-approximation constants (the GPT-2 formula; identical to torch.nn.functional.gelu(…, approximate=‘tanh’). Defined here so the forward, backward, and any per-tensor variant agree byte-for-byte.

gelu(x) = 0.5 * x * (1 + tanh( GELU_C * (x + GELU_K * x^3) ))
0.7978845608028654
GELU_K =

sqrt(2/π)

0.044715
GELU_DK =

cubic coefficient

0.134145
RMS_EPS_DEFAULT =

RMSNorm / LayerNorm default epsilon. Matches Llama / SmolLM2 / GPT-2 conventions. Individual instances can override via their own @eps ivar; this is the fallback used by the row-level helpers.

1.0e-5
LOG_PROB_FLOOR =

Numerical floor for probabilities going into log() in cross_entropy. 1e-12 is safely above the F32 / F64 underflow threshold and matches PyTorch’s “label smoothing” default clip.

1.0e-12
NEG_INF_SCORE =

Causal-mask sentinel: attention scores set to this become ~0 after softmax (Math.exp(-1e30) underflows cleanly to 0.0). Avoid -Infinity because (Float::INFINITY - Float::INFINITY) is NaN if downstream code rescales or subtracts max.

-1.0e30
# tinynn is always required so FFNFFICache is defined (it lives in
# lib/toy/ffi/tinynn.rb). The require itself doesn't run any FFI code; only
# feed_forward_ffi's USE_FFI_MATMUL-gated branch does. With
# USE_FFI_MATMUL=false the FFI methods are dead code that Spinel's
# DCE drops, but the library libs still get linked.
MAT_SHAPES_ON =

One-shot diagnostic: when MAT_SHAPES_ON=true, every matmul prints its shape triple on stdout (post-process with sort | uniq -c). Off by default; the const must exist for Spinel to resolve the read.

(ENV["MAT_SHAPES"] || "") == "1"
GGUF_PATH =

GH#188 – model selection via env. Defaults to SmolLM2-135M for a cheap-to-test smoke; override for any llama-family GGUF. If MODEL_NAME isn’t set, it defaults to the basename of MODEL_PATH minus the “.gguf” suffix – close enough for the /v1/models response.

ENV["MODEL_PATH"] || "data/smollm2-135m-native.gguf"
MODEL_NAME_ENV =
ENV["MODEL_NAME"] || ""
MODEL_NAME =
MODEL_NAME_ENV.length > 0 ? MODEL_NAME_ENV : _mn_default
MAX_T =
(ENV["MAX_T"] || "256").to_i
STATE =
State.new

Instance Method Summary collapse

Instance Method Details

#all_digits?(s) ⇒ Boolean

all_digits? — true iff ‘s` is non-empty and every char is 0..9 (explicit char scan, not exception-based Integer(s), per the Spinel landmines).

Returns:

  • (Boolean)


67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/toy/run/infer.rb', line 67

def all_digits?(s)
  return false if s.length == 0
  i = 0
  while i < s.length
    c = s[i]
    if c < "0" || c > "9"
      return false
    end
    i = i + 1
  end
  true
end

#api_gen_id(prefix) ⇒ Object



105
106
107
108
109
# File 'lib/toy/serve/openai/server.rb', line 105

def api_gen_id(prefix)
  t = Time.now
  v = (t.to_i * 1_000_003) ^ ((t.to_f - t.to_i).to_f * 1.0e9).to_i
  prefix + "-" + v.to_s
end

#api_generate_ids(prompt_ids, n_new) ⇒ Object

Greedy generation from a pre-tokenized prompt. KV-cache decode: prefill the prompt one step at a time, then sample greedily for ‘n_new` more steps. Returns Array<Int> of the new token IDs (does NOT include the prompt).

This routine re-runs the prefill from position 0 every call; the cache’s t_K / t_V tensors are persistent and get overwritten in place. (A future optimisation would be a fast prefix-cache for shared prompts.)



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/toy/serve/openai/server.rb', line 120

def api_generate_ids(prompt_ids, n_new)
  out_ids = [0]
  out_ids.pop

  vocab = STATE.cfg.vocab
  last_logits = Mat.new(1, vocab)
  prefill_pos = 0
  while prefill_pos < prompt_ids.length
    last_logits = SmolLM2KV.decode_step(STATE.kv, prompt_ids[prefill_pos], prefill_pos)
    prefill_pos = prefill_pos + 1
  end

  # First generated token comes from the last prefill step's logits.
  best_idx = 0
  best_val = last_logits.flat[0]
  v_iter = 1
  while v_iter < vocab
    val = last_logits.flat[v_iter]
    if val > best_val; best_val = val; best_idx = v_iter; end
    v_iter = v_iter + 1
  end
  out_ids.push(best_idx)

  step = 1
  while step < n_new
    last_logits = SmolLM2KV.decode_step(STATE.kv, out_ids[out_ids.length - 1],
                                         prompt_ids.length + out_ids.length - 1)
    best_idx = 0
    best_val = last_logits.flat[0]
    v_iter = 1
    while v_iter < vocab
      val = last_logits.flat[v_iter]
      if val > best_val; best_val = val; best_idx = v_iter; end
      v_iter = v_iter + 1
    end
    out_ids.push(best_idx)
    step = step + 1
  end
  out_ids
end

#api_now_unixObject

—- Helpers —-



101
102
103
# File 'lib/toy/serve/openai/server.rb', line 101

def api_now_unix
  Time.now.to_i
end

#fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) ⇒ Object

Look up (or read+cache) the full fused tensor for ‘fused_name` from ckpt A.



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# File 'lib/toy/run/eval_lmc.rb', line 161

def fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full)
  fi = 0
  while fi < fused_names.length
    if fused_names[fi] == fused_name
      return fi
    end
    fi = fi + 1
  end
  # Miss — read both A and B once.
  idx_a = TinyNN.tnn_gguf_find_index(ggA, fused_name)
  idx_b = TinyNN.tnn_gguf_find_index(ggB, fused_name)
  if idx_a < 0 || idx_b < 0
    puts "toy-eval-lmc: missing fused " + fused_name +
                " in A=" + idx_a.to_s + " B=" + idx_b.to_s
    return -1
  end
  ma = Mat.new(1, nel_full)
  mb = Mat.new(1, nel_full)
  TinyNN.tnn_gguf_read_f32_to_doubles(ggA, idx_a, ma.flat, nel_full)
  TinyNN.tnn_gguf_read_f32_to_doubles(ggB, idx_b, mb.flat, nel_full)
  fused_names.push(fused_name)
  fused_a.push(ma)
  fused_b.push(mb)
  return fused_names.length - 1
end

#parse_ids(line) ⇒ Object



19
20
21
22
23
24
25
26
27
28
# File 'lib/toy/train/training.rb', line 19

def parse_ids(line)
  parts = line.split(" ")
  ids   = [parts[0].to_i]
  k = 1
  while k < parts.length
    ids.push(parts[k].to_i)
    k += 1
  end
  ids
end

#read_prompt(path) ⇒ Object



45
46
47
48
49
50
51
52
# File 'lib/toy/train/training.rb', line 45

def read_prompt(path)
  raw = ["?"]
  raw.pop
  File.open(path, "r") do |f|
    f.each_line { |line| raw.push(line.chomp) }
  end
  parse_ids(raw[0])
end

#read_sequences(path) ⇒ Object



30
31
32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/toy/train/training.rb', line 30

def read_sequences(path)
  raw = ["?"]
  raw.pop
  File.open(path, "r") do |f|
    f.each_line { |line| raw.push(line.chomp) }
  end
  seqs = [parse_ids(raw[0])]
  i = 1
  while i < raw.length
    seqs.push(parse_ids(raw[i]))
    i += 1
  end
  seqs
end

#read_vocab(path) ⇒ Object


Corpus readers — load vocab, sequences, and prompt from data/ts_*.txt



10
11
12
13
14
15
16
17
# File 'lib/toy/train/training.rb', line 10

def read_vocab(path)
  vocab = ["?"]
  vocab.pop
  File.open(path, "r") do |f|
    f.each_line { |line| vocab.push(line.chomp) }
  end
  vocab
end