Module: Toy::Labels
- Defined in:
- lib/toy/llm/labels.rb
Class Method Summary collapse
-
.fixed_target(vocab, context, target_id) ⇒ Object
FIXED-TARGET one-hot (toy#73 item 3): every one of the ‘context` positions targets the SAME id — the lora-smoke objective (examples/03_lora.rb: push every position of a fixed prompt toward one token).
-
.next_token(seq_ids, vocab, context, batch) ⇒ Object
UNGUARDED shift-by-one one-hot.
-
.next_token_guarded(seq_ids, vocab, context, batch) ⇒ Object
IN-VOCAB-GUARDED shift-by-one one-hot.
-
.one_hot_class(num_classes, label) ⇒ Object
SINGLE-ROW CLASS one-hot (toy#73 item 3): Mat(1, num_classes) with 1.0 at ‘label` — the ViT classification objective (examples/07_vit_tiny.rb’s hand loop, byte-identical).
Class Method Details
.fixed_target(vocab, context, target_id) ⇒ Object
FIXED-TARGET one-hot (toy#73 item 3): every one of the ‘context` positions targets the SAME id — the lora-smoke objective (examples/03_lora.rb: push every position of a fixed prompt toward one token). Reproduces the example’s hand fill VERBATIM (zero the Mat, then scatter 1.0 at row*vocab + target_id). FAILS LOUD on an out-of-vocab target.
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/toy/llm/labels.rb', line 103 def self.fixed_target(vocab, context, target_id) if target_id < 0 || target_id >= vocab raise "Toy::Labels.fixed_target: target_id " + target_id.to_s + " out of vocab 0..." + vocab.to_s end m = Mat.new(context, vocab) j = 0 while j < context * vocab m.flat[j] = 0.0 j = j + 1 end k = 0 while k < context m.flat[k * vocab + target_id] = 1.0 k = k + 1 end m end |
.next_token(seq_ids, vocab, context, batch) ⇒ Object
UNGUARDED shift-by-one one-hot. Reproduces the from-scratch hand loop (train.rb:297-304, smoke_recipe_from_scratch.rb:71-78) VERBATIM. target = next token, or self at the last position.
‘batch` is INCLUDED per the requested signature but is NOT multiplied into the row count: all current callers are single-sequence context×vocab (batch implicitly 1). Multiplying it in would change the Mat shape and break the byte gate. It is a forward-looking param for a future batched caller — and because a batch != 1 would otherwise be SILENTLY IGNORED (training on a wrongly-shaped one-hot), it now FAILS LOUD (toy#64 item 5).
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/toy/llm/labels.rb', line 41 def self.next_token(seq_ids, vocab, context, batch) if batch != 1 raise "Toy::Labels.next_token: batch " + batch.to_s + " unsupported — batched training deferred (the one-hot " + "is context x vocab; batch is not multiplied into the " + "row count)" end m = Mat.new(context, vocab) j = 0 while j < context * vocab m.flat[j] = 0.0 j = j + 1 end k = 0 while k < context target = (k + 1 < context) ? seq_ids[k + 1] : seq_ids[k] m.flat[k * vocab + target] = 1.0 k = k + 1 end m end |
.next_token_guarded(seq_ids, vocab, context, batch) ⇒ Object
IN-VOCAB-GUARDED shift-by-one one-hot. Reproduces the warm-start hand loop (train.rb:202-214, smoke_recipe_warm_start.rb:111-124) VERBATIM. Identical to next_token EXCEPT the scatter is guarded by ‘target >= 0 && target < vocab` (with `&&` exactly) — warm-start streams arbitrary corpus, so it guards against out-of-vocab ids. Rebuilt every step (seq_ids streams from the corpus).
‘batch` is INCLUDED per the requested signature, NOT multiplied into the row count (same rationale as next_token) — and FAILS LOUD on batch != 1 (toy#64 item 5, same trap).
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/toy/llm/labels.rb', line 73 def self.next_token_guarded(seq_ids, vocab, context, batch) if batch != 1 raise "Toy::Labels.next_token_guarded: batch " + batch.to_s + " unsupported — batched training deferred (the one-hot " + "is context x vocab; batch is not multiplied into the " + "row count)" end m = Mat.new(context, vocab) j = 0 while j < context * vocab m.flat[j] = 0.0 j = j + 1 end k = 0 while k < context target = (k + 1 < context) ? seq_ids[k + 1] : seq_ids[k] if target >= 0 && target < vocab m.flat[k * vocab + target] = 1.0 end k = k + 1 end m end |
.one_hot_class(num_classes, label) ⇒ Object
SINGLE-ROW CLASS one-hot (toy#73 item 3): Mat(1, num_classes) with 1.0 at ‘label` — the ViT classification objective (examples/07_vit_tiny.rb’s hand loop, byte-identical). FAILS LOUD on an out-of-range label (ToyImageLoader.read_label returns -1 on a short read, so a torn labels.bin fails here, not as a silent all-zero label row).
128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'lib/toy/llm/labels.rb', line 128 def self.one_hot_class(num_classes, label) if label < 0 || label >= num_classes raise "Toy::Labels.one_hot_class: label " + label.to_s + " out of range 0..." + num_classes.to_s end m = Mat.new(1, num_classes) j = 0 while j < num_classes m.flat[j] = j == label ? 1.0 : 0.0 j = j + 1 end m end |