Class: Toy::Core::CLI::New

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/core/cli/new.rb

Constant Summary collapse

TOY_YML =
<<~YAML
  # toy.yml — minimal project config. An empty file is valid; all
  # defaults then apply.

  # Template for runs/<run_id>/ directory names. Recognised brace
  # tokens: {arch} {date}(YYYYMMDD) {time}(HHMMSS) {seq}(daily counter).
  run_id_template: "{arch}-{date}-{seq}"

  # Relative dir the framework discovers your algos in (L1-L4).
  # A single algos/my_llama.rb works too; the subdirs are optional.
  algos_path: "algos"
YAML
BINSTUB =

Project-local binstub. Resolves the toy CLI library in this order:

1. ENV["TOY_HOME"]/lib  — explicit override (dev-checkout flow)
2. gem-installed `toy`  — `require "toy/core/cli"` via rubygems

Fails loud with a one-line message (NEVER a stacktrace — same rule the rest of the CLI follows) + exit 1 if neither resolves. See #29.

<<~RUBY
  #!/usr/bin/env ruby
  # Project-local binstub generated by `toy new`. Resolves the toy CLI
  # library via TOY_HOME first (dev-checkout flow), then the gem.
  home = ENV["TOY_HOME"]
  if home && !home.empty? && Dir.exist?(File.join(home, "lib"))
    $LOAD_PATH.unshift(File.join(home, "lib"))
  end
  begin
    require "toy/core/cli"
  rescue LoadError
    $stderr.puts "toy: cannot load the toy CLI library."
    $stderr.puts "  Set TOY_HOME to a toy source checkout, or `gem install toy`."
    exit 1
  end
  exit Toy::Core::CLI.run(ARGV)
RUBY
HELLO_RECIPE =

A RUNNABLE L4 from-scratch starter, scaffolded into algos/recipes/hello.rb. Mirrors examples/legacy/train_from_scratch.rb: the blessed value-object path (SmolLM2Config.mha + Toy::Labels + Toy::AdamW + the FromScratch recipe).

It is a COMPUTE file (loads the ffi-bearing TinyNN stack), so —like every toy example — it is compiled by Spinel, NOT run under MRI (ffi_lib crashes under MRI). Spinel resolves require_relative at COMPILE time from LITERAL relative paths, so the scaffold drops a ‘toy_lib` symlink → $TOY_HOME/lib next to it and hello.rb require_relative’s THROUGH that symlink. Build + run (the toy checkout carries the ggml archives the link step needs):

spinel algos/recipes/hello.rb -o hello && ./hello

Spinel-clean: literal require_relative, no #{} interpolation, no Struct.new, no kwargs / default args.

<<~'RUBY'
  # algos/recipes/hello.rb — a runnable from-scratch starter.
  #
  # The blessed value-object path: a tiny Llama-shape model
  # (RMSNorm + GQA + RoPE + SwiGLU) trained through the L4
  # FromScratch recipe. Mirrors examples/legacy/train_from_scratch.rb.
  #
  # Compile with Spinel (it loads the ffi-bearing TinyNN stack, so
  # it cannot run under MRI). The `toy_lib` symlink (→ a toy source
  # checkout's lib/, created by `toy new`) is how Spinel resolves
  # the framework at compile time. TinyNN's ffi_cflags carry
  # RELATIVE ggml archive paths (-Ltinynn -Lvendor/ggml/build/src),
  # so build with the toy checkout reachable as the link CWD:
  #
  #   cd "$TOY_HOME" && \
  #     spinel /abs/path/to/algos/recipes/hello.rb -o /abs/path/to/hello
  #   /abs/path/to/hello   # → step 1: loss=…

  require_relative "toy_lib/toy"
  require_relative "toy_lib/toy/models/toy_smollm2"
  require_relative "toy_lib/toy/llm/engine/llama_seq_engine"
  require_relative "toy_lib/toy/llm/adamw"
  require_relative "toy_lib/toy/llm/labels"
  require_relative "toy_lib/toy/llm/recipes/from_scratch"

  # Every hyper-parameter reads from ENV at RUNTIME with the
  # scaffold defaults below — ONE compile, many runs:
  #   D_MODEL=128 STEPS=10 ./hello
  # (explicit ENV["X"] || "default" — no default args, Spinel
  # landmine #4). D_FF derives as 2*D_MODEL when unset.
  STEPS    = (ENV["STEPS"]    || "3").to_i
  SEED     = (ENV["SEED"]     || "0").to_i
  VOCAB    = (ENV["VOCAB"]    || "64").to_i
  D_MODEL  = (ENV["D_MODEL"]  || "32").to_i
  N_HEADS  = (ENV["N_HEADS"]  || "2").to_i
  N_LAYERS = (ENV["N_LAYERS"] || "1").to_i
  CONTEXT  = (ENV["CONTEXT"]  || "8").to_i
  D_FF     = (ENV["D_FF"]     || (2 * D_MODEL).to_s).to_i

  # Model shape via the named factory (n_kv == n_heads = MHA).
  cfg = Toy::SmolLM2Config.mha(VOCAB, D_MODEL, N_HEADS,
                               D_FF, N_LAYERS, CONTEXT, 10000.0, 1.0e-5)

  # Named realize-time options (Toy::LLM::RecipeOptions): only
  # the non-default knobs need setting.
  opts = Toy::LLM::RecipeOptions.new
  opts.t_seq  = CONTEXT
  opts.untied = true
  opts.seed   = SEED

  recipe = Toy::LLM::Recipes::FromScratch.new
  recipe.realize!(cfg, opts)

  # A trivial fixed sequence + positions (CONTEXT-length; ids
  # cycle inside the vocab — 1..8 at the defaults).
  seq_ids   = [0]; seq_ids.pop
  i = 0; while i < CONTEXT; seq_ids.push((i + 1) % VOCAB); i = i + 1; end
  positions = [0]; positions.pop
  p = 0; while p < CONTEXT; positions.push(p); p = p + 1; end

  # Shift-by-one one-hot labels + named AdamW hyper-params.
  m_labels = Toy::Labels.next_token(seq_ids, VOCAB, CONTEXT, 1)
  m_hp     = Toy::AdamW.for_from_scratch.hp(0)

  step = 0
  while step < STEPS
    loss = recipe.step!(seq_ids, positions, m_labels, m_hp, step == 0)
    puts "step " + (step + 1).to_s + ": loss=" + loss.to_s
    step = step + 1
  end
RUBY
APP_README =

App-scaffold README: documents the tree + the ENV-driven hello recipe (toy#64 item 7 — one compile, many runs).

<<~'MARKDOWN'
  # toy project

  Scaffolded by `toy new`. The tree:

  - `algos/` — your code, same 4-layer shape as the framework
    (`primitives/` `blocks/` `archs/` `recipes/`).
  - `algos/recipes/hello.rb` — a RUNNABLE from-scratch starter.
  - `data/` — GGUFs + corpora. `runs/` — event streams + checkpoints.
  - `bin/toy` — project-local CLI binstub.

  ## hello.rb: one compile, many runs

  Compile once with Spinel (from a toy checkout, so the ggml link
  paths resolve; `toy_lib` is the symlink `toy new` dropped):

  ```sh
  cd "$TOY_HOME" && spinel /path/to/algos/recipes/hello.rb -o /path/to/hello
  ```

  Every hyper-parameter is read from ENV **at runtime** — sweep
  without recompiling:

  ```sh
  ./hello                          # defaults (V=64 D=32 H=2 L=1 ctx=8, 3 steps)
  D_MODEL=128 STEPS=10 ./hello     # wider model, longer run
  SEED=1 ./hello                   # different init
  ```

  Knobs: `STEPS` `SEED` `VOCAB` `D_MODEL` `N_HEADS` `N_LAYERS`
  `CONTEXT` `D_FF` (defaults to `2*D_MODEL` when unset).

  ## Training through the CLI

  ```sh
  toy train from-scratch --steps 5
  ```

  Runs land in `runs/<id>/` (events.jsonl + weights/). Inspect them
  from plain Ruby with `Toy::RunLog`:

  ```ruby
  require "toy/core/run_log"
  best = Toy::RunLog.scan("runs").first
  puts best.run_id, best.final_loss
  ```
MARKDOWN
ALGO_SUBDIRS =
%w[primitives blocks archs recipes].freeze
LIB_GEMFILE =

—- ‘toy new –lib` : a library-COMPOSITION project —- Unlike the app scaffold (algos/ + toy.yml + the toy CLI), the –lib scaffold is a plain Spinel program that CONSUMES toy as a gem: a Gemfile pulls `toy` (+ `spinel_kit`), `spinel-compat vendor` builds it into vendor/, and experiment.rb requires the one-shot compute surface (toy#42). No toy.yml, no binstub — this is “compose models against toy’s engines”, the tao_transfer shape.

<<~RUBY
  source "https://rubygems.org"
  ruby "3.2.3", engine: "spinel", engine_version: "0.0.0"

  # toy ships its compute surface + native (ggml) build inputs; spinel_kit
  # is toy's stdlib-surface dep (JSON/Git shims). `spinel-compat vendor`
  # copies both into vendor/spinel/ and builds toy's archive there.
  gem "toy"
  gem "spinel_kit", "~> 0.1"
RUBY
LIB_EXPERIMENT =

The starter program — a DEVICE-AGNOSTIC experiment body. It deliberately has NO compute require of its own: the per-device entry shims (main_cpu.rb / main_cuda.rb / main_metal.rb) pick the compute entry AT COMPILE TIME and then require this body (Spinel cannot switch a require on ENV — a conditional require_relative silently compiles to 0). Everything constructs through Toy::Device, so the same body compiles against every entry. Spinel-clean: no #{} interpolation, no kwargs, while-loops.

<<~'RUBY'
  # experiment.rb — DEVICE-AGNOSTIC experiment body.
  #
  # Compiled via the per-device entries (never directly):
  #   ./build.sh             # cpu
  #   ./build.sh cpu cuda    # one binary per device
  # Hyper-parameters read from ENV at RUNTIME with the defaults
  # below — one compile, many runs (STEPS=50 ./experiment_cpu).

  VOCAB   = (ENV["VOCAB"]   || "627").to_i
  D_MODEL = (ENV["D_MODEL"] || "64").to_i
  HEADS   = (ENV["HEADS"]   || "4").to_i
  LAYERS  = (ENV["LAYERS"]  || "2").to_i
  CONTEXT = (ENV["CONTEXT"] || "16").to_i
  STEPS   = (ENV["STEPS"]   || "20").to_i
  SEED    = (ENV["SEED"]    || "0").to_i
  D_FF    = (ENV["D_FF"]    || (2 * D_MODEL).to_s).to_i

  cfg = Toy::SmolLM2Config.mha(VOCAB, D_MODEL, HEADS, D_FF,
                               LAYERS, CONTEXT, 10000.0, 1.0e-5)

  # Named realize-time options; the L4 recipe comes from the
  # device seam (Toy::Device — cpu/cuda/metal picked by the
  # entry that required this body).
  opts = Toy::LLM::RecipeOptions.new
  opts.t_seq = CONTEXT
  opts.seed  = SEED

  recipe = Toy::Device.from_scratch_recipe
  recipe.realize!(cfg, opts)

  seq_ids = [0]
  seq_ids.pop
  i = 0
  while i < CONTEXT
    seq_ids.push(i % VOCAB)
    i = i + 1
  end

  # Validating per-step quartet + named AdamW (from-scratch mode).
  batch = Toy::LLM::TrainingBatch.new(VOCAB, CONTEXT, 1)
  batch.fill!(seq_ids)
  batch.hp = Toy::AdamW.for_from_scratch.hp(0)

  step = 0
  while step < STEPS
    loss = recipe.step!(batch.seq_ids, batch.positions,
                        batch.labels, batch.hp, step == 0)
    puts "step " + (step + 1).to_s + ": loss=" + loss.to_s
    step = step + 1
  end
  puts "experiment: ok (device=" + Toy::Device.name + ")"
RUBY
LIB_MAIN_CPU =

Per-device entry shims — device chosen at COMPILE time by which compute entry each requires. 2 lines each; build.sh picks the shim per requested device.

<<~'RUBY'
  # main_cpu.rb — CPU entry: compile with `./build.sh cpu`.
  require_relative "vendor/spinel/toy/lib/toy/compute"
  require_relative "experiment"
RUBY
LIB_MAIN_CUDA =
<<~'RUBY'
  # main_cuda.rb — CUDA entry: compile with `./build.sh cuda`.
  require_relative "vendor/spinel/toy/lib/toy/compute_cuda"
  require_relative "experiment"
RUBY
LIB_MAIN_METAL =
<<~'RUBY'
  # main_metal.rb — Metal entry (macOS): `./build.sh metal`.
  require_relative "vendor/spinel/toy/lib/toy/compute_metal"
  require_relative "experiment"
RUBY
LIB_BUILD_SH =

Multi-arch build script: loops the requested devices over the per-device entries, producing experiment_<dev> binaries. The –cc force-link/framework flags match toy’s own Makefile cuda / metal targets.

<<~'SH'
  #!/bin/sh
  # build.sh — one device-agnostic experiment.rb, one binary per
  # device. Usage: ./build.sh [cpu] [cuda] [metal]   (default: cpu)
  set -e
  devs="${*:-cpu}"
  # RBS type roots (toy#69 / spinelgems#13): `spinel-compat vendor`
  # aggregates every vendored gem's sig/ (toy's included) under
  # vendor/spinel/sig and advertises it in vendor/spinel/deps.rb.
  # Seeding the analyzer with it keeps UNCALLED gem methods at
  # their declared types (advisory; safe to omit).
  rbs=""
  [ -d vendor/spinel/sig ] && rbs="--rbs vendor/spinel/sig"
  for dev in $devs; do
    entry="main_$dev.rb"
    out="experiment_$dev"
    # GPU units are OPT-IN at vendor time (spinelgems#20): a plain
    # `spinel-compat vendor` skips them. Fail loud here instead of
    # at the linker (toy#70).
    if [ "$dev" != cpu ] && [ ! -f "vendor/spinel/toy/tinynn/libtinynn_ggml_$dev.a" ]; then
      echo "build.sh: no vendored $dev archive — re-vendor with" >&2
      echo "  spinel-compat vendor --with-ext $dev --with-ext $dev-shim" >&2
      exit 2
    fi
    case "$dev" in
      cpu)
        spinel $rbs "$entry" -o "$out" ;;
      cuda)
        spinel $rbs --cc='cc -Wl,-u,tnn_cuda_force_link' "$entry" -o "$out" ;;
      metal)
        spinel $rbs --cc='cc -Wl,-u,_tnn_metal_force_link -framework Foundation -framework Metal -framework MetalKit' "$entry" -o "$out" ;;
      *)
        echo "build.sh: unknown device '$dev' (want cpu|cuda|metal)" >&2
        exit 2 ;;
    esac
    echo "built $out"
  done
SH
LIB_README =
<<~MARKDOWN
  # toy library-composition project

  A Spinel program that composes a model against toy's engines —
  with the DEVICE chosen at COMPILE time.

  ## Build & run
  ```sh
  bundle lock              # resolve toy + spinel_kit
  spinel-compat vendor     # copy + build toy into vendor/spinel/ (CPU)
  ./build.sh               # cpu binary: ./experiment_cpu
  STEPS=50 ./experiment_cpu   # hyper-params are runtime ENV knobs
  ```

  GPU backends are OPT-IN build-units (default-disabled,
  spinelgems#20) — enable them at VENDOR time, then build:
  ```sh
  spinel-compat vendor --with-ext cuda --with-ext cuda-shim
  ./build.sh cpu cuda      # one binary per device
  # macOS: --with-ext metal --with-ext metal-shim, ./build.sh metal
  ```

  > Until `toy` is published to RubyGems, point the Gemfile at a
  > checkout: `gem "toy", path: "../toy"` (or `git:`), then `bundle lock`.

  ## Shape

  - `experiment.rb` — your DEVICE-AGNOSTIC experiment body: it
    constructs through `Toy::Device` (`.llama_engine`,
    `.from_scratch_recipe`, …) and never names a backend class.
  - `main_cpu.rb` / `main_cuda.rb` / `main_metal.rb` — 2-line
    entry shims requiring toy's per-device compute entry
    (`toy/compute`, `toy/compute_cuda`, `toy/compute_metal`)
    then the body. Spinel resolves requires at compile time and
    cannot switch them on ENV — the shim IS the device choice.
  - `build.sh` — loops requested devices over the shims.

  See toy's `docs/consuming-toy.md`.
MARKDOWN
LIB_GITIGNORE =
<<~GITIGNORE
  /vendor/
  /experiment
  /experiment_cpu
  /experiment_cuda
  /experiment_metal
  *.o
  *.a
GITIGNORE

Instance Method Summary collapse

Constructor Details

#initialize(argv) ⇒ New

Returns a new instance of New.



403
404
405
406
407
408
409
# File 'lib/toy/core/cli/new.rb', line 403

def initialize(argv)
  @argv = argv
  @json = false
  @force = false
  @lib = false
  @path = nil
end

Instance Method Details

#runObject



411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
# File 'lib/toy/core/cli/new.rb', line 411

def run
  return EXIT_BAD_INPUT unless parse_args
  target = File.expand_path(@path)

  if File.exist?(target) && !Dir.empty?(target) && !@force
    return fail_out("target #{target.inspect} exists and is not empty (use --force)")
  end

  created = @lib ? scaffold_lib(target) : scaffold(target)

  if @json
    puts JSON.pretty_generate(
      "format" => "toy/new-v1",
      "kind" => @lib ? "lib" : "app",
      "path" => target,
      "created" => created
    )
  elsif @lib
    puts "Created toy library-composition project at #{target}"
    created.each { |rel| puts "  #{rel}" }
    puts ""
    puts "Next: cd #{@path}"
    puts "      bundle lock && spinel-compat vendor   # fetch + build toy into vendor/"
    puts "      ./build.sh            # cpu (also: ./build.sh cpu cuda)"
    puts "      ./experiment_cpu"
  else
    puts "Created toy project at #{target}"
    created.each { |rel| puts "  #{rel}" }
    puts ""
    puts "Next: cd #{@path} && toy train from-scratch --steps 5"
    puts "      (algos/recipes/hello.rb is a runnable from-scratch starter:"
    puts "       spinel algos/recipes/hello.rb -o hello && ./hello)"
  end
  EXIT_OK
rescue SystemCallError => e
  fail_out("could not create project: #{e.message}")
end