Class: Toy::RopeScaling

Inherits:

Object

Object
Toy::RopeScaling

show all

Defined in:: lib/toy/models/toy_smollm2.rb

Overview

RoPE-scaling parameters extracted from a model’s GGUF metadata. FFI rope_ext callsites read every field; per-kind dispatch:

:none     — pass freq_scale=1.0, no freq_factors. Identical to
            the pre-B1 behavior.
:linear   — freq_scale = 1/factor; no freq_factors.
:yarn     — uses every scalar (factor, ext_factor, attn_factor,
            beta_fast, beta_slow, orig_max_pos); no freq_factors.
:llama3   — per-dim freq_factors tensor built once at session
            setup via TinyNN.tnn_rope_freq_factors_llama3.

The constructor takes every field positionally to keep Spinel’s type analyzer happy — no default args (which would widen RbVal across the compiled program). Use .none / .linear / .llama3 builders to construct the common cases.

Instance Attribute Summary collapse

#attn_factor ⇒ Object

Returns the value of attribute attn_factor.
#beta_fast ⇒ Object

Returns the value of attribute beta_fast.
#beta_slow ⇒ Object

Returns the value of attribute beta_slow.
#ext_factor ⇒ Object

Returns the value of attribute ext_factor.
#factor ⇒ Object

Returns the value of attribute factor.
#freq_scale ⇒ Object

Returns the value of attribute freq_scale.
#high_freq_factor ⇒ Object

Returns the value of attribute high_freq_factor.
#kind ⇒ Object

Returns the value of attribute kind.
#low_freq_factor ⇒ Object

Returns the value of attribute low_freq_factor.
#orig_max_pos ⇒ Object

Returns the value of attribute orig_max_pos.

Class Method Summary collapse

.compute_llama3_freq_factors(d_head, freq_base, orig_max_pos, factor, low_freq, high_freq) ⇒ Object

Compute the (d_head/2)-element per-dim freq_factors array for llama3-style scaling.
.linear(factor) ⇒ Object

Linear / NTK / dynamic scaling — single factor.
.llama3(factor, low_freq, high_freq, orig_max_pos) ⇒ Object

Llama-3 style.
.none ⇒ Object

No-scaling — used by SmolLM2, GPT-2, Qwen2-short-context, and any GGUF without rope_scaling.* metadata.

Instance Method Summary collapse

#initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) ⇒ RopeScaling constructor

A new instance of RopeScaling.

Constructor Details

#initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) ⇒ `RopeScaling`

Returns a new instance of RopeScaling.

# File 'lib/toy/models/toy_smollm2.rb', line 40

def initialize(kind,
               freq_scale, orig_max_pos,
               factor, low_freq_factor, high_freq_factor,
               ext_factor, attn_factor, beta_fast, beta_slow)
  @kind             = kind
  @freq_scale       = freq_scale
  @orig_max_pos     = orig_max_pos
  @factor           = factor
  @low_freq_factor  = low_freq_factor
  @high_freq_factor = high_freq_factor
  @ext_factor       = ext_factor
  @attn_factor      = attn_factor
  @beta_fast        = beta_fast
  @beta_slow        = beta_slow
end

Instance Attribute Details

#attn_factor ⇒ `Object`

Returns the value of attribute attn_factor.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def attn_factor
  @attn_factor
end

#beta_fast ⇒ `Object`

Returns the value of attribute beta_fast.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def beta_fast
  @beta_fast
end

#beta_slow ⇒ `Object`

Returns the value of attribute beta_slow.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def beta_slow
  @beta_slow
end

#ext_factor ⇒ `Object`

Returns the value of attribute ext_factor.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def ext_factor
  @ext_factor
end

#factor ⇒ `Object`

Returns the value of attribute factor.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def factor
  @factor
end

#freq_scale ⇒ `Object`

Returns the value of attribute freq_scale.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def freq_scale
  @freq_scale
end

#high_freq_factor ⇒ `Object`

Returns the value of attribute high_freq_factor.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def high_freq_factor
  @high_freq_factor
end

#kind ⇒ `Object`

Returns the value of attribute kind.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def kind
  @kind
end

#low_freq_factor ⇒ `Object`

Returns the value of attribute low_freq_factor.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def low_freq_factor
  @low_freq_factor
end

#orig_max_pos ⇒ `Object`

Returns the value of attribute orig_max_pos.



35
36
37

# File 'lib/toy/models/toy_smollm2.rb', line 35

def orig_max_pos
  @orig_max_pos
end

Class Method Details

.compute_llama3_freq_factors(d_head, freq_base, orig_max_pos, factor, low_freq, high_freq) ⇒ `Object`

Compute the (d_head/2)-element per-dim freq_factors array for llama3-style scaling. Mirrors llama.cpp’s llm_build_inp_rope_factors_llama3:

wavelen_i = 2π * freq_base^(2i / d_head)
if wavelen_i < orig_max / high_freq:  f = 1.0
elif wavelen_i > orig_max / low_freq: f = factor
else: smooth interp between the two endpoints

Returns Array of length d_head/2. Caller uploads into a persistent tensor via tnn_upload_from_float_array.

# File 'lib/toy/models/toy_smollm2.rb', line 88

def self.compute_llama3_freq_factors(d_head, freq_base,
                                     orig_max_pos, factor,
                                     low_freq, high_freq)
  n = d_head / 2
  omp_f         = orig_max_pos.to_f
  low_wavelen   = omp_f / low_freq
  high_wavelen  = omp_f / high_freq
  out = [0.0]; out.pop  # type-pin Array[Float]
  i = 0
  while i < n
    freq    = 1.0 / (freq_base ** ((2.0 * i.to_f) / d_head.to_f))
    wavelen = 2.0 * Math::PI / freq
    if wavelen < high_wavelen
      out.push(1.0)
    elsif wavelen > low_wavelen
      out.push(factor)
    else
      smooth = (omp_f / wavelen - low_freq) / (high_freq - low_freq)
      out.push((1.0 - smooth) * factor + smooth * 1.0)
    end
    i = i + 1
  end
  out
end

.linear(factor) ⇒ `Object`

Linear / NTK / dynamic scaling — single factor. freq_scale = 1/factor (ggml’s convention).

# File 'lib/toy/models/toy_smollm2.rb', line 64

def self.linear(factor)
  Toy::RopeScaling.new(:linear, 1.0 / factor, 0,
                       factor, 1.0, 4.0, 0.0, 1.0, 32.0, 1.0)
end

.llama3(factor, low_freq, high_freq, orig_max_pos) ⇒ `Object`

Llama-3 style. orig_max_pos = original_max_position_embeddings (e.g. 8192 for L3.2). The model passes freq_base + d_head to compute_llama3_freq_factors at session setup; we just carry the formula’s inputs.

# File 'lib/toy/models/toy_smollm2.rb', line 73

def self.llama3(factor, low_freq, high_freq, orig_max_pos)
  Toy::RopeScaling.new(:llama3, 1.0, orig_max_pos,
                       factor, low_freq, high_freq,
                       0.0, 1.0, 32.0, 1.0)
end

.none ⇒ `Object`

No-scaling — used by SmolLM2, GPT-2, Qwen2-short-context, and any GGUF without rope_scaling.* metadata.



58
59
60

# File 'lib/toy/models/toy_smollm2.rb', line 58

def self.none
  Toy::RopeScaling.new(:none, 1.0, 0, 1.0, 1.0, 4.0, 0.0, 1.0, 32.0, 1.0)
end

Class: Toy::RopeScaling

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(kind, freq_scale, orig_max_pos, factor, low_freq_factor, high_freq_factor, ext_factor, attn_factor, beta_fast, beta_slow) ⇒ RopeScaling

Instance Attribute Details

#attn_factor ⇒ Object

#beta_fast ⇒ Object

#beta_slow ⇒ Object

#ext_factor ⇒ Object

#factor ⇒ Object

#freq_scale ⇒ Object

#high_freq_factor ⇒ Object

#kind ⇒ Object

#low_freq_factor ⇒ Object

#orig_max_pos ⇒ Object