Module: Ignis::AI::LlamaLoader

Defined in:
lib/nnw/ai/llama_loader.rb

Overview

LlamaLoader — load HuggingFace Llama-family checkpoints (Llama-3.x, and other LlamaForCausalLM models: SmolLM, TinyLlama, …) into a Transformer::ModernModel.

Reads config.json to size the model (RoPE base/scaling, GQA head counts, RMSNorm eps, tied embeddings), then loads model.safetensors, dequantizing bf16 weights to fp32 on-device. HF’s nn.Linear weights are [out, in] and applied as x·Wᵀ — the SAME convention as Ignis::AI::NN::Linear — so weights map across with no transpose (unlike GPT-2’s Conv1D).

Class Method Summary collapse

Class Method Details

.from_pretrained(dir, device_id: 0) ⇒ Transformer::ModernModel

Build a ModernModel from config.json and load its weights.

Parameters:

  • dir (String)

    directory containing config.json + model.safetensors

  • device_id (Integer) (defaults to: 0)

Returns:



21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/nnw/ai/llama_loader.rb', line 21

def from_pretrained(dir, device_id: 0)
  cfg = JSON.parse(File.read(File.join(dir, "config.json")))
  model = Transformer::ModernModel.new(
    vocab_size:   cfg["vocab_size"],
    embed_dim:    cfg["hidden_size"],
    num_heads:    cfg["num_attention_heads"],
    num_kv_heads: cfg["num_key_value_heads"] || cfg["num_attention_heads"],
    num_layers:   cfg["num_hidden_layers"],
    ff_dim:       cfg["intermediate_size"],
    max_seq_len:  cfg["max_position_embeddings"],
    rope_base:    (cfg["rope_theta"] || 10000.0).to_f,
    rope_scaling: cfg["rope_scaling"],
    head_dim:     cfg["head_dim"],
    eps:          (cfg["rms_norm_eps"] || 1e-5).to_f,
    device_id:    device_id
  )
  load(model, dir, device_id: device_id)
  model
end

.load(model, dir, device_id: 0) ⇒ Integer

Load weights from dir/model.safetensors into an existing ModernModel.

Returns:

  • (Integer)

    number of parameters loaded



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/nnw/ai/llama_loader.rb', line 43

def load(model, dir, device_id: 0)
  tensors = Safetensors.load(File.join(dir, "model.safetensors"), device_id: device_id)
  embed_src = tensors["model.embed_tokens.weight"]
  raise "LlamaLoader: missing model.embed_tokens.weight" unless embed_src
  lm_head_src = tensors["lm_head.weight"] # nil when embeddings are tied

  count = 0
  model.named_parameters.each do |name, param|
    src = if name == "head.weight"
            lm_head_src || embed_src   # untied lm_head, else tied to embeddings
          else
            tensors[hf_name(name)]
          end
    raise "LlamaLoader: no source weight for param #{name} (#{hf_name(name)})" unless src
    unless param.shape == src.shape
      raise "LlamaLoader: shape mismatch for #{name}: model #{param.shape} vs file #{src.shape}"
    end
    dequant_into!(src.data, param.data)
    count += 1
  end
  Ignis.synchronize
  count
end