Class: Toy::LLM::Recipes::LoRACuda

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/llm/recipes/lora_cuda.rb

Overview

The LoRACuda fine-tune recipe. realize! builds the frozen-base + LoRACuda-Q-adapter forward+CE+backward+AdamW graph on a Toy::LLM::Engine::LlamaSeqEngineCuda (base weights mmap’d from the GGUF, only the rank-r adapters + Adam moments are trainable), then step! drives one training step. The caller (fixture) owns the loaded GGUF handle, the experiment config, and the per-step input Mats.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeLoRACuda

Returns a new instance of LoRACuda.



54
55
56
57
58
59
60
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 54

def initialize
  @lora_cache      = Toy::LLM::Engine::LlamaSeqEngineCuda.new
  @lora_t_loss     = nil
  @lora_t_labels   = nil
  @lora_t_hp       = nil
  @lora_step_index = 0
end

Instance Attribute Details

#lora_cacheObject

Returns the value of attribute lora_cache.



52
53
54
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 52

def lora_cache
  @lora_cache
end

#lora_step_indexObject

Returns the value of attribute lora_step_index.



52
53
54
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 52

def lora_step_index
  @lora_step_index
end

#lora_t_hpObject

Returns the value of attribute lora_t_hp.



52
53
54
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 52

def lora_t_hp
  @lora_t_hp
end

#lora_t_labelsObject

Returns the value of attribute lora_t_labels.



52
53
54
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 52

def lora_t_labels
  @lora_t_labels
end

#lora_t_lossObject

Returns the value of attribute lora_t_loss.



52
53
54
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 52

def lora_t_loss
  @lora_t_loss
end

Instance Method Details

#realize!(gguf_handle, cfg, rank, opts) ⇒ Object

Realize the LoRACuda graph. Delegates VERBATIM to the cache in the reference’s order (03_finetune_lora.rb:67-76): enable_lora_q!(rank) + enable_lora_q_adamw! (set the two flags BEFORE realize), then realize_for_mmap (mmap the frozen base in place), then the seeded upload_lora_q_init!(seed, init_scale) (deterministic adapter init), then build_training_step (forward + CE + backward + opt_step_adamw on the adapters baked into the ggml graph). Stashes the returned

t_loss, t_labels, t_hp

triple. ‘opts` is a Toy::LLM::RecipeOptions

(toy#64 item 1); the lora mmap path consumes its t_seq / untied / qkv_bias / seed / init_scale (NO t_batch / weight_dtype knob on realize_for_mmap). ‘rank` is lora-specific so it stays a leading positional. Unpacked in the engine’s exact positional order, so the realize is byte-identical. Returns nil.



75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 75

def realize!(gguf_handle, cfg, rank, opts)
  @lora_cache.enable_lora_q!(rank)
  @lora_cache.enable_lora_q_adamw!
  @lora_cache.realize_for_mmap(gguf_handle, cfg, opts.t_seq,
                               opts.untied, opts.qkv_bias)
  @lora_cache.upload_lora_q_init!(opts.seed, opts.init_scale)
  result         = @lora_cache.build_training_step
  @lora_t_loss   = result[0]
  @lora_t_labels = result[1]
  @lora_t_hp     = result[2]
  nil
end

#step!(seq_ids, positions, m_labels, m_hp, is_first) ⇒ Object

ONE training step. Op order is VERBATIM from 03_finetune_lora.rb:179-191 (and LITERALLY IDENTICAL to FromScratch#step!): graph_reset on the first step else reset_grads_only; the four uploads in order (token_ids/positions/labels/hp); compute_backward; download_row_major(t_loss, 1, 1). is_first selects the reset; the NOT used for the reset decision, so the caller stays in full control of the step==first branch. Returns the loss Float. Per-step input Mats (including the bias-corrected hp row) are built by the caller.



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# File 'lib/toy/llm/recipes/lora_cuda.rb', line 98

def step!(seq_ids, positions, m_labels, m_hp, is_first)
  s = @lora_cache.sess
  if is_first
    TinyNNCuda.tnn_graph_reset(s)
  else
    TinyNNCuda.tnn_graph_reset_grads_only(s)
  end
  TinyNNCuda.upload_int_array(s, @lora_cache.t_seq_token_ids, seq_ids)
  TinyNNCuda.upload_int_array(s, @lora_cache.t_seq_positions, positions)
  TinyNNCuda.upload_row_major(s, @lora_t_labels, m_labels)
  TinyNNCuda.upload_row_major(s, @lora_t_hp,     m_hp)
  TinyNNCuda.tnn_compute_backward(s)
  loss_mat = TinyNNCuda.download_row_major(s, @lora_t_loss, 1, 1)
  loss_mat.flat[0]
end