Module: SmolLM2ConfigLoader

Defined in:: lib/toy/io/loaders/toy_smollm2_loader.rb

Class Method Summary collapse

.read(path) ⇒ Object
.read_rope_scaling(handle) ⇒ Object

Read rope_scaling.* metadata from the GGUF and pick a RopeScaling variant.

Class Method Details

.read(path) ⇒ `Object`

# File 'lib/toy/io/loaders/toy_smollm2_loader.rb', line 703

def self.read(path)
  handle = TinyNN.tnn_gguf_load(path)
  if handle == nil
    puts "SmolLM2ConfigLoader: failed to open " + path
    return Toy::SmolLM2Config.new(0, 0, 0, 0, 0, 0, 0, 10000.0, 1.0e-5)
  end
  # M2.3 + I-Gemma: arch-prefix probe (llama.* / olmoe.* / gemma2.* / …).
  # embedding_length is in every arch; vocab_size isn't (OLMoE omits it).
  ap = "llama"
  if TinyNN.tnn_gguf_get_u32(handle, "llama.embedding_length") < 0
    if TinyNN.tnn_gguf_get_u32(handle, "olmoe.embedding_length") >= 0
      ap = "olmoe"
    elsif TinyNN.tnn_gguf_get_u32(handle, "gemma2.embedding_length") >= 0
      ap = "gemma2"
    end
  end
  vocab     = TinyNN.tnn_gguf_get_u32(handle, ap + ".vocab_size")
  if vocab < 0
    vocab = TinyNN.tnn_gguf_arr_n(handle, "tokenizer.ggml.tokens")
  end
  d_model   = TinyNN.tnn_gguf_get_u32(handle, ap + ".embedding_length")
  d_ff      = TinyNN.tnn_gguf_get_u32(handle, ap + ".feed_forward_length")
  n_head    = TinyNN.tnn_gguf_get_u32(handle, ap + ".attention.head_count")
  n_kv      = TinyNN.tnn_gguf_get_u32(handle, ap + ".attention.head_count_kv")
  n_layer   = TinyNN.tnn_gguf_get_u32(handle, ap + ".block_count")
  ctx       = TinyNN.tnn_gguf_get_u32(handle, ap + ".context_length")
  rope_base = TinyNN.tnn_gguf_get_f32(handle, ap + ".rope.freq_base")
  if rope_base <= 0.0
    # Gemma 2 doesn't emit rope.freq_base — it uses the HF default
    # of 10000. Same fallback works for any arch that omits the
    # key. Models that genuinely need a custom base (Llama-3.x,
    # Qwen3 — 100000 or 1000000) always emit it.
    rope_base = 10000.0
  end
  rms_eps   = TinyNN.tnn_gguf_get_f32(handle, ap + ".attention.layer_norm_rms_epsilon")
  # M1.1: prefer explicit head_dim (<arch>.attention.key_length) when
  # present. Qwen3 sets head_dim=128 explicitly even though
  # hidden_size/num_heads = 64. Returns -1 when key absent; we treat
  # that as "use the default" (d_model / n_heads, computed in
  # SmolLM2Config.initialize).
  head_dim = TinyNN.tnn_gguf_get_u32(handle, ap + ".attention.key_length")
  scaling   = read_rope_scaling(handle)
  TinyNN.tnn_gguf_free(handle)
  cfg = Toy::SmolLM2Config.new(vocab, d_model, n_head, n_kv, d_ff, n_layer,
                               ctx, rope_base, rms_eps)
  cfg.rope_scaling = scaling
  if head_dim > 0
    cfg.head_dim = head_dim
  end
  cfg
end

.read_rope_scaling(handle) ⇒ `Object`

Read rope_scaling.* metadata from the GGUF and pick a RopeScaling variant. GGUF keys (when present):

llama.rope.scaling.type          (str: "linear", "yarn", "llama3", "longrope")
llama.rope.scaling.factor        (f32)
llama.rope.scaling.original_context_length        (u32)
llama.rope.scaling.low_freq_factor                (f32, llama3)
llama.rope.scaling.high_freq_factor               (f32, llama3)
llama.rope.scaling.attn_factor   (f32, yarn)
llama.rope.scaling.beta_fast     (f32, yarn)
llama.rope.scaling.beta_slow     (f32, yarn)

Returns Toy::RopeScaling.none when the type key is missing (gguf accessors return -1/0 sentinels). LongRoPE not yet supported —falls back to .none with a warning.

# File 'lib/toy/io/loaders/toy_smollm2_loader.rb', line 662

def self.read_rope_scaling(handle)
  kind_str = TinyNN.tnn_gguf_get_str(handle, "llama.rope.scaling.type")
  if kind_str == nil
    return Toy::RopeScaling.none
  end
  if kind_str == "linear"
    f = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.factor")
    puts "rope_scaling: linear (factor=" + f.to_s + ")"
    return Toy::RopeScaling.linear(f)
  end
  if kind_str == "llama3"
    f    = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.factor")
    lo   = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.low_freq_factor")
    hi   = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.high_freq_factor")
    omp  = TinyNN.tnn_gguf_get_u32(handle, "llama.rope.scaling.original_context_length")
    if omp < 0
      omp = TinyNN.tnn_gguf_get_u32(handle, "llama.context_length")
    end
    puts "rope_scaling: llama3 (factor=" + f.to_s +
         " low=" + lo.to_s + " high=" + hi.to_s +
         " orig_max=" + omp.to_s + ")"
    return Toy::RopeScaling.llama3(f, lo, hi, omp)
  end
  if kind_str == "yarn"
    f   = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.factor")
    omp = TinyNN.tnn_gguf_get_u32(handle, "llama.rope.scaling.original_context_length")
    af  = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.attn_factor")
    bf  = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.beta_fast")
    bs  = TinyNN.tnn_gguf_get_f32(handle, "llama.rope.scaling.beta_slow")
    if af == 0.0; af = 1.0; end
    if bf == 0.0; bf = 32.0; end
    if bs == 0.0; bs = 1.0; end
    puts "rope_scaling: yarn (factor=" + f.to_s + " orig_max=" + omp.to_s +
         " attn_factor=" + af.to_s + ")"
    return Toy::RopeScaling.new(:yarn, 1.0 / f, omp, f, 1.0, 4.0, 1.0, af, bf, bs)
  end
  # longrope and any unknown kind: fall back, with a warning.
  puts "rope_scaling: unsupported type '" + kind_str + "' — using no-scaling (likely degraded long-context quality)"
  Toy::RopeScaling.none
end

Module: SmolLM2ConfigLoader

Class Method Summary collapse

Class Method Details

.read(path) ⇒ Object

.read_rope_scaling(handle) ⇒ Object

.read(path) ⇒ `Object`

.read_rope_scaling(handle) ⇒ `Object`