Top Level Namespace
Defined Under Namespace
Modules: GGUFLoad, GPT2BPE, GPT2ConfigLoader, GPT2FFI, GPT2FFICuda, GPT2FFIMetal, GPT2KV, GPT2KVCuda, GPT2KVMetal, GgufKV, Sampler, SmolLM2ConfigLoader, SmolLM2KV, SmolLM2KVCuda, SmolLM2KVMetal, TinyNN, TinyNNCuda, TinyNNMetal, Toy, ToyChatTemplate, ToyCorpusLoader, ToyDescribeFlow, ToyDriftGrad, ToyGGUFFuser, ToyGGUFWriter, ToyImageLoader, ToyLR, ToyLogProbs, ToySample, ToyTap, ToyTokenDrift, ToyVit Classes: Adam, AdamState, AdamStepResult, Arch, AttnCache, AttnResult, Block, BlockFFICache, BlockFFICacheCuda, BlockResult, ChatCompletionsHandler, CompletionsHandler, DataLoader, EmbeddingsHandler, FFCache, FFNFFICache, FFNFFICacheCuda, FFNFFICacheMetal, FFResult, ForwardCache, FullForwardFFICache, FullForwardFFICacheCuda, GPT2BPETables, GPT2Block, GPT2BlockFFI, GPT2BlockFFICuda, GPT2BlockFFIMetal, GPT2Config, GPT2FullForwardFFICache, GPT2FullForwardFFICacheCuda, GPT2FullForwardFFICacheMetal, GPT2KVBlockFFI, GPT2KVBlockFFICuda, GPT2KVBlockFFIMetal, GPT2KVFFICache, GPT2KVFFICacheCuda, GPT2KVFFICacheMetal, GPT2KVStepResult, GPT2KVStepResultCuda, GPT2KVStepResultMetal, GPT2LM, Gradients, HeadCache, HealthHandler, IndexHandler, LRSchedule, LayerCache, LossResult, Mat, ModelEntry, ModelIndex, ModelsHandler, Module, NormResult, SamplerConfig, SamplerContext, SmolLM2KVBlockFFI, SmolLM2KVBlockFFICuda, SmolLM2KVBlockFFIMetal, SmolLM2KVFFICache, SmolLM2KVFFICacheCuda, SmolLM2KVFFICacheMetal, SmolLM2KVStepResult, SmolLM2KVStepResultCuda, SmolLM2KVStepResultMetal, State, Tokenizer, ToyLM, ToyLMCuda, ToyLMMetal, TransformerLM, ViTTinyConfig
Constant Summary collapse
- GGUF =
ENV["GGUF"] || "data/smollm2-135m-native.gguf"
- TOP_K =
(ENV["TOP_K"] || "5").to_i
- PROMPT =
ENV["PROMPT"] || "Once upon a time"
- N_NEW =
(ENV["N_NEW"] || "16").to_i
- PROMPT_IDS =
models (parity with the CPU runner). Empty when unset.
ENV["PROMPT_IDS"] || "261"
- TAO_RUN_DIR =
—- Events sink (toy/v1 serving telemetry; FILE only). ——————- TOP-LEVEL constants (NEVER inside a branch — Spinel does not initialize a top-level CONSTANT assigned inside a conditional arm at runtime; it reads back empty, silently skipping all event writes; landmine, train.rb:82-85). TAO_RUN_DIR is set by ‘toy serve` (lib/toy/core/cli/serve.rb) when it has resolved a run id + created runs/<id>/. When empty, serving is events-OFF (cheap-when-off: every emit guard short-circuits), exactly like train.
ENV["TAO_RUN_DIR"] || ""
- RUN_ID =
ENV["TOY_RUN_ID"] || ""
- EVENTS =
— Events (FILE only when TAO_RUN_DIR set). —
TAO_RUN_DIR.length > 0 ? (TAO_RUN_DIR + "/events.jsonl") : ""
- SERVE_PORT =
PORT hoisted ABOVE the run_start emit so config.port is available. No side effects, so the hoist is safe (the Tep.run! call below still binds it).
(ENV["PORT"] || "4567").to_i
- RECIPE =
NOTE: this CUDA runner hosts the two RANDOM-INIT recipes (from-scratch + warm-start, both Toy::LLM::Engine::LlamaSeqEngineCuda#realize_for_random_init), selected by RECIPE. The LoRA recipe lives in a SEPARATE binary (lib/toy/run/train_lora_cuda.rb -> libexec/toy-train-lora-cuda): its #realize_for_mmap path cannot share a Spinel compilation unit with the random-init path without a cfg type-merge miscompile (landmine #16, same rationale as the CPU split train_lora.rb vs train.rb).
ENV["RECIPE"] || "from-scratch"
- STEPS =
ENV reads — TOP-LEVEL constants (Spinel constant-in-conditional caveat).
(ENV["STEPS"] || "5").to_i
- SEED =
(ENV["SEED"] || "0").to_i
- VOCAB =
From-scratch gate shape (literal, matches the CPU runner so the curve compares).
627- D_MODEL =
64- DONOR_D =
128- N_HEADS =
4- D_FF =
128- N_LAYERS =
2- CONTEXT =
32- LMC_A =
ENV["LMC_A"] || ""
- LMC_B =
ENV["LMC_B"] || ""
- ALPHAS_S =
ENV["LMC_ALPHAS"] || "0,0.25,0.5,0.75,1.0"
- SEQ_LEN =
(ENV["CONTEXT"] || "32").to_i
- RUN_DIR =
ENV["TAO_RUN_DIR"] || ""
- D_HEAD =
D_HEAD = d_model / n_heads. Per-head fused-tensor slice geometry.
d_model / n_heads
- IMG_DIR =
(ENV["IMG_DIR"] || "data/vit_smoke")
- TOY_RUN_ID =
(ENV["TOY_RUN_ID"] || "vit-tiny")
- IMAGE_SIZE =
Gate-fixed timm ViT-Tiny SHAPE — hardcoded (NOT env/flags). This is the shape data/vit_smoke matches (224/16/196/10); the 16x16 ENV defaults in 07_train_vit_tiny.rb’s header are the REJECTED synthetic shape.
224- PATCH_SIZE =
16- NUM_CHAN =
3- NUM_CLASSES =
10- LN_EPS =
1.0e-5- N_IMAGES =
1- LR_MAX =
LR schedule PINNED to 07’s defaults (confirmed produce the recorded baseline curve). With WARMUP=10 > STEPS=5, every step is on the linear warmup ramp.
0.003- LR_MIN =
0.0001- WARMUP =
10- LR =
(ENV["LR"] || "0.001").to_f
- RANK_LORA =
(ENV["RANK"] || "8").to_i
- TARGET_ID =
99- TOKENS =
[12092, 4845, 253, 1429]
- USE_FFI_MATMUL =
The FFN’s two matmuls go through TinyNN (ggml-CPU FFI) when this is true. Off by default to keep the toy zero-dep; flip on to use the bridge and accelerate at real-LLM scale (see tinynn/README.md).
false- GELU_C =
GeLU tanh-approximation constants (the GPT-2 formula; identical to torch.nn.functional.gelu(…, approximate=‘tanh’). Defined here so the forward, backward, and any per-tensor variant agree byte-for-byte.
gelu(x) = 0.5 * x * (1 + tanh( GELU_C * (x + GELU_K * x^3) )) 0.7978845608028654- GELU_K =
sqrt(2/π)
0.044715- GELU_DK =
cubic coefficient
0.134145- RMS_EPS_DEFAULT =
RMSNorm / LayerNorm default epsilon. Matches Llama / SmolLM2 / GPT-2 conventions. Individual instances can override via their own @eps ivar; this is the fallback used by the row-level helpers.
1.0e-5- LOG_PROB_FLOOR =
Numerical floor for probabilities going into log() in cross_entropy. 1e-12 is safely above the F32 / F64 underflow threshold and matches PyTorch’s “label smoothing” default clip.
1.0e-12- NEG_INF_SCORE =
Causal-mask sentinel: attention scores set to this become ~0 after softmax (Math.exp(-1e30) underflows cleanly to 0.0). Avoid -Infinity because (Float::INFINITY - Float::INFINITY) is NaN if downstream code rescales or subtracts max.
-1.0e30 # tinynn is always required so FFNFFICache is defined (it lives in # lib/toy/ffi/tinynn.rb). The require itself doesn't run any FFI code; only # feed_forward_ffi's USE_FFI_MATMUL-gated branch does. With # USE_FFI_MATMUL=false the FFI methods are dead code that Spinel's # DCE drops, but the library libs still get linked.
- MAT_SHAPES_ON =
One-shot diagnostic: when MAT_SHAPES_ON=true, every matmul prints its shape triple on stdout (post-process with sort | uniq -c). Off by default; the const must exist for Spinel to resolve the read.
(ENV["MAT_SHAPES"] || "") == "1"
- GGUF_PATH =
GH#188 – model selection via env. Defaults to SmolLM2-135M for a cheap-to-test smoke; override for any llama-family GGUF. If MODEL_NAME isn’t set, it defaults to the basename of MODEL_PATH minus the “.gguf” suffix – close enough for the /v1/models response.
ENV["MODEL_PATH"] || "data/smollm2-135m-native.gguf"
- MODEL_NAME_ENV =
ENV["MODEL_NAME"] || ""
- MODEL_NAME =
MODEL_NAME_ENV.length > 0 ? MODEL_NAME_ENV : _mn_default
- MAX_T =
(ENV["MAX_T"] || "256").to_i
- STATE =
State.new
Instance Method Summary collapse
-
#all_digits?(s) ⇒ Boolean
all_digits? — true iff ‘s` is non-empty and every char is 0..9 (explicit char scan, not exception-based Integer(s), per the Spinel landmines).
- #api_gen_id(prefix) ⇒ Object
-
#api_generate_ids(prompt_ids, n_new) ⇒ Object
Greedy generation from a pre-tokenized prompt.
-
#api_now_unix ⇒ Object
—- Helpers —-.
-
#fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) ⇒ Object
Look up (or read+cache) the full fused tensor for ‘fused_name` from ckpt A.
- #parse_ids(line) ⇒ Object
- #read_prompt(path) ⇒ Object
- #read_sequences(path) ⇒ Object
-
#read_vocab(path) ⇒ Object
————————————————————————— Corpus readers — load vocab, sequences, and prompt from data/ts_*.txt —————————————————————————.
Instance Method Details
#all_digits?(s) ⇒ Boolean
all_digits? — true iff ‘s` is non-empty and every char is 0..9 (explicit char scan, not exception-based Integer(s), per the Spinel landmines).
67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/toy/run/infer.rb', line 67 def all_digits?(s) return false if s.length == 0 i = 0 while i < s.length c = s[i] if c < "0" || c > "9" return false end i = i + 1 end true end |
#api_gen_id(prefix) ⇒ Object
105 106 107 108 109 |
# File 'lib/toy/serve/openai/server.rb', line 105 def api_gen_id(prefix) t = Time.now v = (t.to_i * 1_000_003) ^ ((t.to_f - t.to_i).to_f * 1.0e9).to_i prefix + "-" + v.to_s end |
#api_generate_ids(prompt_ids, n_new) ⇒ Object
Greedy generation from a pre-tokenized prompt. KV-cache decode: prefill the prompt one step at a time, then sample greedily for ‘n_new` more steps. Returns Array<Int> of the new token IDs (does NOT include the prompt).
This routine re-runs the prefill from position 0 every call; the cache’s t_K / t_V tensors are persistent and get overwritten in place. (A future optimisation would be a fast prefix-cache for shared prompts.)
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/toy/serve/openai/server.rb', line 120 def api_generate_ids(prompt_ids, n_new) out_ids = [0] out_ids.pop vocab = STATE.cfg.vocab last_logits = Mat.new(1, vocab) prefill_pos = 0 while prefill_pos < prompt_ids.length last_logits = SmolLM2KV.decode_step(STATE.kv, prompt_ids[prefill_pos], prefill_pos) prefill_pos = prefill_pos + 1 end # First generated token comes from the last prefill step's logits. best_idx = 0 best_val = last_logits.flat[0] v_iter = 1 while v_iter < vocab val = last_logits.flat[v_iter] if val > best_val; best_val = val; best_idx = v_iter; end v_iter = v_iter + 1 end out_ids.push(best_idx) step = 1 while step < n_new last_logits = SmolLM2KV.decode_step(STATE.kv, out_ids[out_ids.length - 1], prompt_ids.length + out_ids.length - 1) best_idx = 0 best_val = last_logits.flat[0] v_iter = 1 while v_iter < vocab val = last_logits.flat[v_iter] if val > best_val; best_val = val; best_idx = v_iter; end v_iter = v_iter + 1 end out_ids.push(best_idx) step = step + 1 end out_ids end |
#api_now_unix ⇒ Object
—- Helpers —-
101 102 103 |
# File 'lib/toy/serve/openai/server.rb', line 101 def api_now_unix Time.now.to_i end |
#fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) ⇒ Object
Look up (or read+cache) the full fused tensor for ‘fused_name` from ckpt A.
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
# File 'lib/toy/run/eval_lmc.rb', line 161 def fused_lookup_a(fused_names, fused_a, fused_b, ggA, ggB, fused_name, nel_full) fi = 0 while fi < fused_names.length if fused_names[fi] == fused_name return fi end fi = fi + 1 end # Miss — read both A and B once. idx_a = TinyNN.tnn_gguf_find_index(ggA, fused_name) idx_b = TinyNN.tnn_gguf_find_index(ggB, fused_name) if idx_a < 0 || idx_b < 0 puts "toy-eval-lmc: missing fused " + fused_name + " in A=" + idx_a.to_s + " B=" + idx_b.to_s return -1 end ma = Mat.new(1, nel_full) mb = Mat.new(1, nel_full) TinyNN.tnn_gguf_read_f32_to_doubles(ggA, idx_a, ma.flat, nel_full) TinyNN.tnn_gguf_read_f32_to_doubles(ggB, idx_b, mb.flat, nel_full) fused_names.push(fused_name) fused_a.push(ma) fused_b.push(mb) return fused_names.length - 1 end |
#parse_ids(line) ⇒ Object
19 20 21 22 23 24 25 26 27 28 |
# File 'lib/toy/train/training.rb', line 19 def parse_ids(line) parts = line.split(" ") ids = [parts[0].to_i] k = 1 while k < parts.length ids.push(parts[k].to_i) k += 1 end ids end |
#read_prompt(path) ⇒ Object
45 46 47 48 49 50 51 52 |
# File 'lib/toy/train/training.rb', line 45 def read_prompt(path) raw = ["?"] raw.pop File.open(path, "r") do |f| f.each_line { |line| raw.push(line.chomp) } end parse_ids(raw[0]) end |
#read_sequences(path) ⇒ Object
30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/toy/train/training.rb', line 30 def read_sequences(path) raw = ["?"] raw.pop File.open(path, "r") do |f| f.each_line { |line| raw.push(line.chomp) } end seqs = [parse_ids(raw[0])] i = 1 while i < raw.length seqs.push(parse_ids(raw[i])) i += 1 end seqs end |
#read_vocab(path) ⇒ Object
Corpus readers — load vocab, sequences, and prompt from data/ts_*.txt
10 11 12 13 14 15 16 17 |
# File 'lib/toy/train/training.rb', line 10 def read_vocab(path) vocab = ["?"] vocab.pop File.open(path, "r") do |f| f.each_line { |line| vocab.push(line.chomp) } end vocab end |