Class: Toy::RopeScaling
- Inherits:
-
Object
- Object
- Toy::RopeScaling
- Defined in:
- lib/toy/models/toy_smollm2.rb
Overview
RoPE-scaling parameters extracted from a model’s GGUF metadata. FFI rope_ext callsites read every field; per-kind dispatch:
:none — pass freq_scale=1.0, no freq_factors. Identical to
the pre-B1 behavior.
:linear — freq_scale = 1/factor; no freq_factors.
:yarn — uses every scalar (factor, ext_factor, attn_factor,
beta_fast, beta_slow, orig_max_pos); no freq_factors.
:llama3 — per-dim freq_factors tensor built once at session
setup via TinyNN.tnn_rope_freq_factors_llama3.
The constructor takes every field positionally to keep Spinel’s type analyzer happy — no default args (which would widen RbVal across the compiled program). Use .none / .linear / .llama3 builders to construct the common cases.
Instance Attribute Summary collapse
-
#attn_factor ⇒ Object
Returns the value of attribute attn_factor.
-
#beta_fast ⇒ Object
Returns the value of attribute beta_fast.
-
#beta_slow ⇒ Object
Returns the value of attribute beta_slow.
-
#ext_factor ⇒ Object
Returns the value of attribute ext_factor.
-
#factor ⇒ Object
Returns the value of attribute factor.
-
#freq_scale ⇒ Object
Returns the value of attribute freq_scale.
-
#high_freq_factor ⇒ Object
Returns the value of attribute high_freq_factor.
-
#kind ⇒ Object
Returns the value of attribute kind.
-
#low_freq_factor ⇒ Object
Returns the value of attribute low_freq_factor.
-
#orig_max_pos ⇒ Object
Returns the value of attribute orig_max_pos.
Class Method Summary collapse
-
.compute_llama3_freq_factors(d_head, freq_base, orig_max_pos, factor, low_freq, high_freq) ⇒ Object
Compute the (d_head/2)-element per-dim freq_factors array for llama3-style scaling.
-
.linear(factor) ⇒ Object
Linear / NTK / dynamic scaling — single factor.
-
.llama3(factor, low_freq, high_freq, orig_max_pos) ⇒ Object
Llama-3 style.
-
.none ⇒ Object
No-scaling — used by SmolLM2, GPT-2, Qwen2-short-context, and any GGUF without rope_scaling.* metadata.
Instance Method Summary collapse
-
#initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) ⇒ RopeScaling
constructor
A new instance of RopeScaling.
Constructor Details
#initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) ⇒ RopeScaling
Returns a new instance of RopeScaling.
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/toy/models/toy_smollm2.rb', line 40 def initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) @kind = kind @freq_scale = freq_scale @orig_max_pos = orig_max_pos @factor = factor @low_freq_factor = low_freq_factor @high_freq_factor = high_freq_factor @ext_factor = ext_factor @attn_factor = attn_factor @beta_fast = beta_fast @beta_slow = beta_slow end |
Instance Attribute Details
#attn_factor ⇒ Object
Returns the value of attribute attn_factor.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def attn_factor @attn_factor end |
#beta_fast ⇒ Object
Returns the value of attribute beta_fast.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def beta_fast @beta_fast end |
#beta_slow ⇒ Object
Returns the value of attribute beta_slow.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def beta_slow @beta_slow end |
#ext_factor ⇒ Object
Returns the value of attribute ext_factor.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def ext_factor @ext_factor end |
#factor ⇒ Object
Returns the value of attribute factor.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def factor @factor end |
#freq_scale ⇒ Object
Returns the value of attribute freq_scale.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def freq_scale @freq_scale end |
#high_freq_factor ⇒ Object
Returns the value of attribute high_freq_factor.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def high_freq_factor @high_freq_factor end |
#kind ⇒ Object
Returns the value of attribute kind.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def kind @kind end |
#low_freq_factor ⇒ Object
Returns the value of attribute low_freq_factor.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def low_freq_factor @low_freq_factor end |
#orig_max_pos ⇒ Object
Returns the value of attribute orig_max_pos.
35 36 37 |
# File 'lib/toy/models/toy_smollm2.rb', line 35 def orig_max_pos @orig_max_pos end |
Class Method Details
.compute_llama3_freq_factors(d_head, freq_base, orig_max_pos, factor, low_freq, high_freq) ⇒ Object
Compute the (d_head/2)-element per-dim freq_factors array for llama3-style scaling. Mirrors llama.cpp’s llm_build_inp_rope_factors_llama3:
wavelen_i = 2π * freq_base^(2i / d_head)
if wavelen_i < orig_max / high_freq: f = 1.0
elif wavelen_i > orig_max / low_freq: f = factor
else: smooth interp between the two endpoints
Returns Array of length d_head/2. Caller uploads into a persistent tensor via tnn_upload_from_float_array.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/toy/models/toy_smollm2.rb', line 88 def self.compute_llama3_freq_factors(d_head, freq_base, orig_max_pos, factor, low_freq, high_freq) n = d_head / 2 omp_f = orig_max_pos.to_f low_wavelen = omp_f / low_freq high_wavelen = omp_f / high_freq out = [0.0]; out.pop # type-pin Array[Float] i = 0 while i < n freq = 1.0 / (freq_base ** ((2.0 * i.to_f) / d_head.to_f)) wavelen = 2.0 * Math::PI / freq if wavelen < high_wavelen out.push(1.0) elsif wavelen > low_wavelen out.push(factor) else smooth = (omp_f / wavelen - low_freq) / (high_freq - low_freq) out.push((1.0 - smooth) * factor + smooth * 1.0) end i = i + 1 end out end |
.linear(factor) ⇒ Object
Linear / NTK / dynamic scaling — single factor. freq_scale = 1/factor (ggml’s convention).
64 65 66 67 |
# File 'lib/toy/models/toy_smollm2.rb', line 64 def self.linear(factor) Toy::RopeScaling.new(:linear, 1.0 / factor, 0, factor, 1.0, 4.0, 0.0, 1.0, 32.0, 1.0) end |
.llama3(factor, low_freq, high_freq, orig_max_pos) ⇒ Object
Llama-3 style. orig_max_pos = original_max_position_embeddings (e.g. 8192 for L3.2). The model passes freq_base + d_head to compute_llama3_freq_factors at session setup; we just carry the formula’s inputs.
73 74 75 76 77 |
# File 'lib/toy/models/toy_smollm2.rb', line 73 def self.llama3(factor, low_freq, high_freq, orig_max_pos) Toy::RopeScaling.new(:llama3, 1.0, orig_max_pos, factor, low_freq, high_freq, 0.0, 1.0, 32.0, 1.0) end |
.none ⇒ Object
No-scaling — used by SmolLM2, GPT-2, Qwen2-short-context, and any GGUF without rope_scaling.* metadata.
58 59 60 |
# File 'lib/toy/models/toy_smollm2.rb', line 58 def self.none Toy::RopeScaling.new(:none, 1.0, 0, 1.0, 1.0, 4.0, 0.0, 1.0, 32.0, 1.0) end |