Module: ToyLR
- Defined in:
- lib/toy/train/toy_lr_schedule.rb
Overview
E2.4 / GH#14 — learning-rate schedules.
Cosine decay with optional linear warmup. AdamW’s hp (lr) is the only knob the training graph needs; callers compute the schedule value at the current step and upload it before each compute_backward.
Reference: Loshchilov & Hutter 2017 (SGDR / cosine); used as-is in Touvron et al. (Llama 2, 3), Qwen, SmolLM2 training recipes.
Class Method Summary collapse
-
.constant(step, lr) ⇒ Object
Constant — for sanity/baseline runs.
-
.cosine(step, n_steps, lr_max, lr_min, warmup_steps) ⇒ Object
Cosine schedule from lr_max → lr_min over n_steps, with an optional linear warmup over the first ‘warmup_steps`.
Class Method Details
.constant(step, lr) ⇒ Object
Constant — for sanity/baseline runs.
35 36 37 38 |
# File 'lib/toy/train/toy_lr_schedule.rb', line 35 def self.constant(step, lr) step lr end |
.cosine(step, n_steps, lr_max, lr_min, warmup_steps) ⇒ Object
Cosine schedule from lr_max → lr_min over n_steps, with an optional linear warmup over the first ‘warmup_steps`. The warmup ramps from lr_max/warmup_steps up to lr_max linearly, then cosine-decays to lr_min by step `n_steps`.
step is 0-indexed.
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/toy/train/toy_lr_schedule.rb', line 17 def self.cosine(step, n_steps, lr_max, lr_min, warmup_steps) if step < warmup_steps # Linear warmup: 0 → lr_max over warmup_steps. return lr_max * (step + 1).to_f / warmup_steps.to_f end # Cosine decay over (step - warmup_steps) / (n_steps - warmup_steps). denom = (n_steps - warmup_steps).to_f if denom <= 0.0 return lr_min end progress = (step - warmup_steps).to_f / denom if progress > 1.0 progress = 1.0 end lr_min + 0.5 * (lr_max - lr_min) * (1.0 + Math.cos(Math::PI * progress)) end |