Class: Toy::RopeScaling

Inherits:
Object
  • Object
show all
Defined in:
lib/toy/models/toy_smollm2.rb

Overview

RoPE-scaling parameters extracted from a model’s GGUF metadata. FFI rope_ext callsites read every field; per-kind dispatch:

:none     — pass freq_scale=1.0, no freq_factors. Identical to
            the pre-B1 behavior.
:linear   — freq_scale = 1/factor; no freq_factors.
:yarn     — uses every scalar (factor, ext_factor, attn_factor,
            beta_fast, beta_slow, orig_max_pos); no freq_factors.
:llama3   — per-dim freq_factors tensor built once at session
            setup via TinyNN.tnn_rope_freq_factors_llama3.

The constructor takes every field positionally to keep Spinel’s type analyzer happy — no default args (which would widen RbVal across the compiled program). Use .none / .linear / .llama3 builders to construct the common cases.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) ⇒ RopeScaling

Returns a new instance of RopeScaling.



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/toy/models/toy_smollm2.rb', line 40

def initialize(kind,
               freq_scale, orig_max_pos,
               factor, low_freq_factor, high_freq_factor,
               ext_factor, attn_factor, beta_fast, beta_slow)
  @kind             = kind
  @freq_scale       = freq_scale
  @orig_max_pos     = orig_max_pos
  @factor           = factor
  @low_freq_factor  = low_freq_factor
  @high_freq_factor = high_freq_factor
  @ext_factor       = ext_factor
  @attn_factor      = attn_factor
  @beta_fast        = beta_fast
  @beta_slow        = beta_slow
end

Instance Attribute Details

#attn_factorObject

Returns the value of attribute attn_factor.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def attn_factor
  @attn_factor
end

#beta_fastObject

Returns the value of attribute beta_fast.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def beta_fast
  @beta_fast
end

#beta_slowObject

Returns the value of attribute beta_slow.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def beta_slow
  @beta_slow
end

#ext_factorObject

Returns the value of attribute ext_factor.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def ext_factor
  @ext_factor
end

#factorObject

Returns the value of attribute factor.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def factor
  @factor
end

#freq_scaleObject

Returns the value of attribute freq_scale.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def freq_scale
  @freq_scale
end

#high_freq_factorObject

Returns the value of attribute high_freq_factor.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def high_freq_factor
  @high_freq_factor
end

#kindObject

Returns the value of attribute kind.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def kind
  @kind
end

#low_freq_factorObject

Returns the value of attribute low_freq_factor.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def low_freq_factor
  @low_freq_factor
end

#orig_max_posObject

Returns the value of attribute orig_max_pos.



35
36
37
# File 'lib/toy/models/toy_smollm2.rb', line 35

def orig_max_pos
  @orig_max_pos
end

Class Method Details

.compute_llama3_freq_factors(d_head, freq_base, orig_max_pos, factor, low_freq, high_freq) ⇒ Object

Compute the (d_head/2)-element per-dim freq_factors array for llama3-style scaling. Mirrors llama.cpp’s llm_build_inp_rope_factors_llama3:

wavelen_i = 2π * freq_base^(2i / d_head)
if wavelen_i < orig_max / high_freq:  f = 1.0
elif wavelen_i > orig_max / low_freq: f = factor
else: smooth interp between the two endpoints

Returns Array of length d_head/2. Caller uploads into a persistent tensor via tnn_upload_from_float_array.



88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/toy/models/toy_smollm2.rb', line 88

def self.compute_llama3_freq_factors(d_head, freq_base,
                                     orig_max_pos, factor,
                                     low_freq, high_freq)
  n = d_head / 2
  omp_f         = orig_max_pos.to_f
  low_wavelen   = omp_f / low_freq
  high_wavelen  = omp_f / high_freq
  out = [0.0]; out.pop  # type-pin Array[Float]
  i = 0
  while i < n
    freq    = 1.0 / (freq_base ** ((2.0 * i.to_f) / d_head.to_f))
    wavelen = 2.0 * Math::PI / freq
    if wavelen < high_wavelen
      out.push(1.0)
    elsif wavelen > low_wavelen
      out.push(factor)
    else
      smooth = (omp_f / wavelen - low_freq) / (high_freq - low_freq)
      out.push((1.0 - smooth) * factor + smooth * 1.0)
    end
    i = i + 1
  end
  out
end

.linear(factor) ⇒ Object

Linear / NTK / dynamic scaling — single factor. freq_scale = 1/factor (ggml’s convention).



64
65
66
67
# File 'lib/toy/models/toy_smollm2.rb', line 64

def self.linear(factor)
  Toy::RopeScaling.new(:linear, 1.0 / factor, 0,
                       factor, 1.0, 4.0, 0.0, 1.0, 32.0, 1.0)
end

.llama3(factor, low_freq, high_freq, orig_max_pos) ⇒ Object

Llama-3 style. orig_max_pos = original_max_position_embeddings (e.g. 8192 for L3.2). The model passes freq_base + d_head to compute_llama3_freq_factors at session setup; we just carry the formula’s inputs.



73
74
75
76
77
# File 'lib/toy/models/toy_smollm2.rb', line 73

def self.llama3(factor, low_freq, high_freq, orig_max_pos)
  Toy::RopeScaling.new(:llama3, 1.0, orig_max_pos,
                       factor, low_freq, high_freq,
                       0.0, 1.0, 32.0, 1.0)
end

.noneObject

No-scaling — used by SmolLM2, GPT-2, Qwen2-short-context, and any GGUF without rope_scaling.* metadata.



58
59
60
# File 'lib/toy/models/toy_smollm2.rb', line 58

def self.none
  Toy::RopeScaling.new(:none, 1.0, 0, 1.0, 1.0, 4.0, 0.0, 1.0, 32.0, 1.0)
end