Class: Toy::RunBundle

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/io/run_bundle.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(runs_root, run_id) ⇒ RunBundle

Create runs_root/ and runs_root/run_id/, then TRUNCATE-open the events.jsonl sink. On open failure (rc != 0) warns loud with the rc + path and continues with events disabled (active == false) —compute must not die because a bundle dir is unwritable.



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/toy/io/run_bundle.rb', line 86

def initialize(runs_root, run_id)
  @rb_root       = runs_root
  @rb_run_id     = run_id
  @rb_dir        = runs_root + "/" + run_id
  @rb_active     = false
  @rb_git_sha    = ""   # empty = git{} omitted from run_start
  @rb_git_branch = ""
  TinyNN.tnn_filesystem_mkdir(runs_root)
  TinyNN.tnn_filesystem_mkdir(@rb_dir)
  rc = TinyNN.tnn_events_open_trunc(@rb_dir + "/events.jsonl")
  if rc == 0
    @rb_active = true
  else
    puts "warn: Toy::RunBundle could not open " + @rb_dir +
         "/events.jsonl (rc=" + rc.to_s + ") — run bundle disabled"
  end
end

Instance Attribute Details

#rb_activeObject

Returns the value of attribute rb_active.



79
80
81
# File 'lib/toy/io/run_bundle.rb', line 79

def rb_active
  @rb_active
end

#rb_dirObject

Returns the value of attribute rb_dir.



79
80
81
# File 'lib/toy/io/run_bundle.rb', line 79

def rb_dir
  @rb_dir
end

#rb_git_branchObject

Returns the value of attribute rb_git_branch.



79
80
81
# File 'lib/toy/io/run_bundle.rb', line 79

def rb_git_branch
  @rb_git_branch
end

#rb_git_shaObject

Returns the value of attribute rb_git_sha.



79
80
81
# File 'lib/toy/io/run_bundle.rb', line 79

def rb_git_sha
  @rb_git_sha
end

#rb_rootObject

Returns the value of attribute rb_root.



79
80
81
# File 'lib/toy/io/run_bundle.rb', line 79

def rb_root
  @rb_root
end

#rb_run_idObject

Returns the value of attribute rb_run_id.



79
80
81
# File 'lib/toy/io/run_bundle.rb', line 79

def rb_run_id
  @rb_run_id
end

Class Method Details

.json_escape(s) ⇒ Object

Minimal JSON string-body escaper (same table as SpinelKit::Json::Builder.escape, minus the u00XX arm — run ids and arch names are ASCII-clean by construction).



256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/toy/io/run_bundle.rb', line 256

def self.json_escape(s)
  out = ""
  i = 0
  n = s.length
  while i < n
    c = s[i]
    if c == "\""
      out = out + "\\\""
    elsif c == "\\"
      out = out + "\\\\"
    elsif c == "\n"
      out = out + "\\n"
    elsif c == "\r"
      out = out + "\\r"
    elsif c == "\t"
      out = out + "\\t"
    else
      out = out + c
    end
    i = i + 1
  end
  out
end

Instance Method Details

#activeObject



104
105
106
# File 'lib/toy/io/run_bundle.rb', line 104

def active
  @rb_active
end

#events_pathObject



108
109
110
# File 'lib/toy/io/run_bundle.rb', line 108

def events_path
  @rb_dir + "/events.jsonl"
end

#git!(sha, branch) ⇒ Object

OPT-IN git provenance (see the header — it cannot be automatic). Call BEFORE run_start! with the two strings from SpinelKit::Git (or any sha/branch pair); run_start! then includes gitsha,branch after backend{}. Returns nil.



125
126
127
128
129
# File 'lib/toy/io/run_bundle.rb', line 125

def git!(sha, branch)
  @rb_git_sha    = sha
  @rb_git_branch = branch
  nil
end

#run_end!(final_step, final_loss) ⇒ Object

Emit the toy/v1 run_end (exactly one, last) and CLOSE the sink. reason:“completed” / exit_code:0 — RunBundle is for runs that ran to completion; a crashed run just leaves the bundle without a run_end (the documented torn-bundle case consumers handle). Returns nil.



238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'lib/toy/io/run_bundle.rb', line 238

def run_end!(final_step, final_loss)
  if @rb_active
    TinyNN.tnn_events_emit("{\"kind\":\"run_end\"" +
      ",\"t\":" + TinyNN.tnn_events_now_seconds.to_s +
      ",\"ended_at\":\"" + TinyNN.tnn_events_iso8601_now + "\"" +
      ",\"reason\":\"completed\"" +
      ",\"final_step\":" + final_step.to_s +
      ",\"final_loss\":" + final_loss.to_s +
      ",\"exit_code\":0}")
    TinyNN.tnn_events_close
    @rb_active = false
  end
  nil
end

#run_start!(arch, vocab, d_model, n_layers, context, steps, lr, seed) ⇒ Object

Emit the toy/v1 run_start (exactly one per bundle, first): schema + t + started_at + run_id + phase:“train” + host{} + backend{} + [git{} if injected] + modeln_layers + configcontext,steps,lr,seed. arch is e.g. “llama” / “gpt2” / “vit”; for llama/vit configs the per-arch starts above read these fields off the cfg for you.



206
207
208
209
210
211
212
213
214
215
216
217
218
219
# File 'lib/toy/io/run_bundle.rb', line 206

def run_start!(arch, vocab, d_model, n_layers, context, steps, lr, seed)
  if @rb_active
    TinyNN.tnn_events_emit(run_start_prefix +
      ",\"model\":{\"arch\":\"" + RunBundle.json_escape(arch) + "\"" +
      ",\"vocab\":" + vocab.to_s +
      ",\"d_model\":" + d_model.to_s +
      ",\"n_layers\":" + n_layers.to_s + "}" +
      ",\"config\":{\"context\":" + context.to_s +
      ",\"steps\":" + steps.to_s +
      ",\"lr\":" + lr.to_s +
      ",\"seed\":" + seed.to_s + "}}")
  end
  nil
end

#run_start_llama!(lcfg, steps, lr, seed) ⇒ Object

run_start for a llama-shaped run: model{} + config{} come from the Toy::SmolLM2Config. Byte-identical to run_start!(“llama”, cfg.vocab, cfg.d_model, cfg.n_layers, cfg.ctx, steps, lr, seed). Returns nil.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/toy/io/run_bundle.rb', line 164

def run_start_llama!(lcfg, steps, lr, seed)
  if @rb_active
    TinyNN.tnn_events_emit(run_start_prefix +
      ",\"model\":{\"arch\":\"llama\"" +
      ",\"vocab\":" + lcfg.vocab.to_s +
      ",\"d_model\":" + lcfg.d_model.to_s +
      ",\"n_layers\":" + lcfg.n_layers.to_s + "}" +
      ",\"config\":{\"context\":" + lcfg.ctx.to_s +
      ",\"steps\":" + steps.to_s +
      ",\"lr\":" + lr.to_s +
      ",\"seed\":" + seed.to_s + "}}")
  end
  nil
end

#run_start_prefixObject

The shared run_start prefix: kind/schema/t/started_at/run_id/ phase:“train” + host{} + backendToy::Device.name + git{} when injected via git! — the same key order as the train runners’ Toy::Events.add_provenance. Callers append model{} + config{} and emit.



136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/toy/io/run_bundle.rb', line 136

def run_start_prefix
  s = "{\"kind\":\"run_start\",\"schema\":\"toy/v1\"" +
      ",\"t\":" + TinyNN.tnn_events_now_seconds.to_s +
      ",\"started_at\":\"" + TinyNN.tnn_events_iso8601_now + "\"" +
      ",\"run_id\":\"" + RunBundle.json_escape(@rb_run_id) + "\"" +
      ",\"phase\":\"train\"" +
      ",\"host\":{\"name\":\"" +
        RunBundle.json_escape(TinyNN.tnn_provenance_host_name) +
      "\",\"os\":\"" + TinyNN.tnn_provenance_host_os +
      "\",\"arch\":\"" + TinyNN.tnn_provenance_host_arch + "\"}" +
      ",\"backend\":{\"kind\":\"" + Toy::Device.name + "\"}"
  if @rb_git_sha.length > 0
    s = s + ",\"git\":{\"sha\":\"" + RunBundle.json_escape(@rb_git_sha) +
        "\",\"branch\":\"" + RunBundle.json_escape(@rb_git_branch) + "\"}"
  end
  s
end

#run_start_vit!(vcfg, steps, lr, seed) ⇒ Object

run_start for a ViT run: model{} carries the image shape (image_size/patch_size/d_model/n_layers/num_classes from the ViTTinyConfig) and config{} drops the token-context key (a ViT has none — the patch count derives from the model block). Returns nil.



184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/toy/io/run_bundle.rb', line 184

def run_start_vit!(vcfg, steps, lr, seed)
  if @rb_active
    TinyNN.tnn_events_emit(run_start_prefix +
      ",\"model\":{\"arch\":\"vit\"" +
      ",\"image_size\":" + vcfg.image_size.to_s +
      ",\"patch_size\":" + vcfg.patch_size.to_s +
      ",\"d_model\":" + vcfg.d_model.to_s +
      ",\"n_layers\":" + vcfg.n_layers.to_s +
      ",\"num_classes\":" + vcfg.num_classes.to_s + "}" +
      ",\"config\":{\"steps\":" + steps.to_s +
      ",\"lr\":" + lr.to_s +
      ",\"seed\":" + seed.to_s + "}}")
  end
  nil
end

#step!(step, loss) ⇒ Object

Emit one toy/v1 step event. ‘step` is the 1-indexed step number (matches the runners’ “step N: loss=” stdout line). Returns nil.



223
224
225
226
227
228
229
230
231
# File 'lib/toy/io/run_bundle.rb', line 223

def step!(step, loss)
  if @rb_active
    TinyNN.tnn_events_emit("{\"kind\":\"step\",\"phase\":\"train\"" +
      ",\"t\":" + TinyNN.tnn_events_now_seconds.to_s +
      ",\"step\":" + step.to_s +
      ",\"loss\":" + loss.to_s + "}")
  end
  nil
end

#weights_dirObject

The checkpoint-dir convention: runs/<id>/weights/, created on first ask. Pass the returned path to ToyGGUFWriter.write_step (or any GGUF writer) so the bundle layout matches docs/events.md.



115
116
117
118
119
# File 'lib/toy/io/run_bundle.rb', line 115

def weights_dir
  d = @rb_dir + "/weights"
  TinyNN.tnn_filesystem_mkdir(d)
  d
end