Class: Toy::LLM::Recipes::WarmStartCuda

Inherits:

Object

Object
Toy::LLM::Recipes::WarmStartCuda

show all

Defined in:: lib/toy/llm/recipes/warm_start_cuda.rb

Overview

The warm-start training recipe. realize_scratch! builds the random-init forward+CE+backward+AdamW graph on a Toy::LLM::Engine::LlamaSeqEngineCuda (random-init realize self-enables full_finetune + train_embeddings, so no extra enable_* call is needed) and OPENS the warm window; realize_warm! (optional) uploads an already-read donor embedding into the realize’d embed table BEFORE the graph is baked; build! CLOSES the window by baking forward+CE+backward+opt_step_adamw into the ggml graph. step! then drives one training step. The caller (fixture) owns the experiment config, the donor/PCA GGUF read, the corpus stream, the LR schedule, and the per-step input Mats.

Instance Attribute Summary collapse

#ws_cache ⇒ Object

Returns the value of attribute ws_cache.
#ws_step_index ⇒ Object

Returns the value of attribute ws_step_index.
#ws_t_hp ⇒ Object

Returns the value of attribute ws_t_hp.
#ws_t_labels ⇒ Object

Returns the value of attribute ws_t_labels.
#ws_t_loss ⇒ Object

Returns the value of attribute ws_t_loss.

Class Method Summary collapse

.donor_embed_width(donor_gguf_path) ⇒ Object

Read the donor’s embedding width (llama.embedding_length) from a GGUF path — the value the caller must put in cfg.donor_d_in BEFORE realize_scratch! (the projection lens is sized donor_d_in x d_model at realize time, so the recipe cannot learn it later).

Instance Method Summary collapse

#build! ⇒ Object

CLOSE the warm window: bake forward + CE + backward + opt_step_adamw into the ggml graph (no Ruby Trainer/optimizer — same rationale as FromScratch).
#initialize ⇒ WarmStartCuda constructor

A new instance of WarmStartCuda.
#realize_scratch!(cfg, opts) ⇒ Object

Realize the random-init graph and OPEN the warm window.
#realize_warm!(donor_gguf_path, cfg) ⇒ Object

OPTIONAL: warm the realize’d embed table from a donor GGUF.
#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object

ONE training step.
#upload_donor!(donor_buf_flat, n_floats) ⇒ Object

The raw upload MECHANISM realize_warm! rides (and the seam for already-read buffers — e.g. the legacy PCA-lens flow): one tnn_upload_from_float_array into the realize’d token_embed table (mirrors 09 L180).

Constructor Details

#initialize ⇒ `WarmStartCuda`

Returns a new instance of WarmStartCuda.

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 75

def initialize
  @ws_cache      = Toy::LLM::Engine::LlamaSeqEngineCuda.new
  @ws_t_loss     = nil
  @ws_t_labels   = nil
  @ws_t_hp       = nil
  @ws_step_index = 0
end

Instance Attribute Details

#ws_cache ⇒ `Object`

Returns the value of attribute ws_cache.



73
74
75

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73

def ws_cache
  @ws_cache
end

#ws_step_index ⇒ `Object`

Returns the value of attribute ws_step_index.



73
74
75

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73

def ws_step_index
  @ws_step_index
end

#ws_t_hp ⇒ `Object`

Returns the value of attribute ws_t_hp.



73
74
75

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73

def ws_t_hp
  @ws_t_hp
end

#ws_t_labels ⇒ `Object`

Returns the value of attribute ws_t_labels.



73
74
75

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73

def ws_t_labels
  @ws_t_labels
end

#ws_t_loss ⇒ `Object`

Returns the value of attribute ws_t_loss.



73
74
75

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 73

def ws_t_loss
  @ws_t_loss
end

Class Method Details

.donor_embed_width(donor_gguf_path) ⇒ `Object`

Read the donor’s embedding width (llama.embedding_length) from a GGUF path — the value the caller must put in cfg.donor_d_in BEFORE realize_scratch! (the projection lens is sized donor_d_in x d_model at realize time, so the recipe cannot learn it later). FAILS LOUD on a missing/corrupt donor or a non-llama-family GGUF. (toy#73 item 4 — the read half of the donor plumbing realize_warm! owns.)

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 107

def self.donor_embed_width(donor_gguf_path)
  if !File.exist?(donor_gguf_path)
    raise "WarmStartCuda.donor_embed_width: donor GGUF not found: " +
          donor_gguf_path
  end
  ggh = TinyNNCuda.tnn_gguf_load(donor_gguf_path)
  if ggh == nil || ggh == TinyNNCuda.tnn_null_ptr
    raise "WarmStartCuda.donor_embed_width: failed to open " +
          donor_gguf_path + " (not a GGUF?)"
  end
  donor_d = TinyNNCuda.tnn_gguf_get_u32(ggh, "llama.embedding_length")
  TinyNNCuda.tnn_gguf_free(ggh)
  if donor_d <= 0
    raise "WarmStartCuda.donor_embed_width: donor has no " +
          "llama.embedding_length key — not llama-family? (" +
          donor_gguf_path + ")"
  end
  donor_d
end

Instance Method Details

#build! ⇒ `Object`

CLOSE the warm window: bake forward + CE + backward + opt_step_adamw into the ggml graph (no Ruby Trainer/optimizer —same rationale as FromScratch). Delegates VERBATIM to build_training_step and stashes the returned [t_loss, t_labels, t_hp] triple. Returns nil.

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 203

def build!
  result       = @ws_cache.build_training_step
  @ws_t_loss   = result[0]
  @ws_t_labels = result[1]
  @ws_t_hp     = result[2]
  nil
end

#realize_scratch!(cfg, opts) ⇒ `Object`

Realize the random-init graph and OPEN the warm window. Delegates VERBATIM to the cache: realize_for_random_init (which self-enables ‘opts` is a Toy::LLM::RecipeOptions (toy#64 item 1) carrying the former 7 trailing positional args, unpacked here in the engine’s exact positional order (identical to FromScratch#realize! / 09 L138), so the realize is byte-identical. Does NOT bake the graph —that is build!‘s job, leaving the window open for an optional realize_warm! upload in between. Returns nil.

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 92

def realize_scratch!(cfg, opts)
  @ws_cache.realize_for_random_init(cfg, opts.t_seq, opts.t_batch,
                                    opts.weight_dtype, opts.untied,
                                    opts.qkv_bias, opts.seed,
                                    opts.init_scale)
  nil
end

#realize_warm!(donor_gguf_path, cfg) ⇒ `Object`

OPTIONAL: warm the realize’d embed table from a donor GGUF. Owns the WHOLE donor read (toy#73 item 4 — was ~25 lines of bare GGUF plumbing in every consumer): open, re-read llama.embedding_length and DIM-CHECK it against cfg.donor_d_in (the width the lens was realized at — a mismatch would silently upload garbage through the wrong stride), find token_embd.weight, read the first cfg.vocab rows, upload through upload_donor!, free. Every failure raises NAMED + LOUD (which tensor, expected vs got, which path). Must be called AFTER realize_scratch! (the tensor exists) and BEFORE build! (else we train through the random init). INIT=scratch flows skip this method entirely. Returns nil.

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 138

def realize_warm!(donor_gguf_path, cfg)
  if !File.exist?(donor_gguf_path)
    raise "WarmStartCuda#realize_warm!: donor GGUF not found: " +
          donor_gguf_path
  end
  ggh = TinyNNCuda.tnn_gguf_load(donor_gguf_path)
  if ggh == nil || ggh == TinyNNCuda.tnn_null_ptr
    raise "WarmStartCuda#realize_warm!: failed to open " +
          donor_gguf_path + " (not a GGUF?)"
  end
  donor_d = TinyNNCuda.tnn_gguf_get_u32(ggh, "llama.embedding_length")
  if donor_d <= 0
    TinyNNCuda.tnn_gguf_free(ggh)
    raise "WarmStartCuda#realize_warm!: donor has no " +
          "llama.embedding_length key — not llama-family? (" +
          donor_gguf_path + ")"
  end
  if donor_d != cfg.donor_d_in
    TinyNNCuda.tnn_gguf_free(ggh)
    raise "WarmStartCuda#realize_warm!: token_embd.weight width " +
          "mismatch: expected donor_d_in=" + cfg.donor_d_in.to_s +
          " (the width realize_scratch! sized the lens at) but " +
          "donor llama.embedding_length=" + donor_d.to_s + " (" +
          donor_gguf_path + ")"
  end
  te_idx = TinyNNCuda.tnn_gguf_find_index(ggh, "token_embd.weight")
  if te_idx < 0
    TinyNNCuda.tnn_gguf_free(ggh)
    raise "WarmStartCuda#realize_warm!: donor has no " +
          "token_embd.weight tensor (" + donor_gguf_path + ")"
  end
  n_floats = cfg.vocab * donor_d
  te_buf = Mat.new(1, n_floats)
  rc = TinyNNCuda.tnn_gguf_read_f32_to_doubles(ggh, te_idx,
                                           te_buf.flat, n_floats)
  if rc != 0
    TinyNNCuda.tnn_gguf_free(ggh)
    raise "WarmStartCuda#realize_warm!: token_embd.weight read failed " +
          "rc=" + rc.to_s + " — wanted " + n_floats.to_s +
          " floats (vocab " + cfg.vocab.to_s + " x donor_d " +
          donor_d.to_s + ") from " + donor_gguf_path
  end
  upload_donor!(te_buf.flat, n_floats)
  TinyNNCuda.tnn_gguf_free(ggh)
  nil
end

#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ `Object`

ONE training step. Op order is COPIED VERBATIM from FromScratch#step! (from_scratch.rb:83-97) and LITERALLY IDENTICAL to LoRA#step!: graph_reset on the first step else reset_grads_only; the four uploads in order (token_ids/positions/labels/hp); compute_backward; download_row_major(t_loss, 1, 1). is_first selects the reset; the @ws_step_index accessor is carried for callers that want it but is NOT used for the reset decision, so the caller stays in full control of the step==0 branch. The per-step LR enters ONLY via the caller mutating m_hp.flat before this call — there is deliberately NO lr param here (matches the siblings; keeps schedule logic in the fixture). Returns the loss Float. Per-step input Mats are built by the caller.

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 223

def step!(seq_ids, positions, m_labels, m_hp, is_first)
  s = @ws_cache.sess
  if is_first
    TinyNNCuda.tnn_graph_reset(s)
  else
    TinyNNCuda.tnn_graph_reset_grads_only(s)
  end
  TinyNNCuda.upload_int_array(s, @ws_cache.t_seq_token_ids, seq_ids)
  TinyNNCuda.upload_int_array(s, @ws_cache.t_seq_positions, positions)
  TinyNNCuda.upload_row_major(s, @ws_t_labels, m_labels)
  TinyNNCuda.upload_row_major(s, @ws_t_hp,     m_hp)
  TinyNNCuda.tnn_compute_backward(s)
  loss_mat = TinyNNCuda.download_row_major(s, @ws_t_loss, 1, 1)
  loss_mat.flat[0]
end

#upload_donor!(donor_buf_flat, n_floats) ⇒ `Object`

The raw upload MECHANISM realize_warm! rides (and the seam for already-read buffers — e.g. the legacy PCA-lens flow): one tnn_upload_from_float_array into the realize’d token_embed table (mirrors 09 L180). Same window rules as realize_warm!. The PCA lens W_proj upload (09 L188-229) stays caller-side through

# File 'lib/toy/llm/recipes/warm_start_cuda.rb', line 191

def upload_donor!(donor_buf_flat, n_floats)
  TinyNNCuda.tnn_upload_from_float_array(@ws_cache.sess,
                                     @ws_cache.t_seq_token_embed,
                                     donor_buf_flat, n_floats)
  nil
end

Class: Toy::LLM::Recipes::WarmStartCuda

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ WarmStartCuda

Instance Attribute Details

#ws_cache ⇒ Object

#ws_step_index ⇒ Object

#ws_t_hp ⇒ Object

#ws_t_labels ⇒ Object

#ws_t_loss ⇒ Object