Module: Ignis::AI::LlamaLoader
- Defined in:
- lib/nnw/ai/llama_loader.rb
Overview
LlamaLoader — load HuggingFace Llama-family checkpoints (Llama-3.x, and other LlamaForCausalLM models: SmolLM, TinyLlama, …) into a Transformer::ModernModel.
Reads config.json to size the model (RoPE base/scaling, GQA head counts, RMSNorm eps, tied embeddings), then loads model.safetensors, dequantizing bf16 weights to fp32 on-device. HF’s nn.Linear weights are [out, in] and applied as x·Wᵀ — the SAME convention as Ignis::AI::NN::Linear — so weights map across with no transpose (unlike GPT-2’s Conv1D).
Class Method Summary collapse
-
.from_pretrained(dir, device_id: 0) ⇒ Transformer::ModernModel
Build a ModernModel from config.json and load its weights.
-
.load(model, dir, device_id: 0) ⇒ Integer
Load weights from dir/model.safetensors into an existing ModernModel.
Class Method Details
.from_pretrained(dir, device_id: 0) ⇒ Transformer::ModernModel
Build a ModernModel from config.json and load its weights.
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# File 'lib/nnw/ai/llama_loader.rb', line 21 def from_pretrained(dir, device_id: 0) cfg = JSON.parse(File.read(File.join(dir, "config.json"))) model = Transformer::ModernModel.new( vocab_size: cfg["vocab_size"], embed_dim: cfg["hidden_size"], num_heads: cfg["num_attention_heads"], num_kv_heads: cfg["num_key_value_heads"] || cfg["num_attention_heads"], num_layers: cfg["num_hidden_layers"], ff_dim: cfg["intermediate_size"], max_seq_len: cfg["max_position_embeddings"], rope_base: (cfg["rope_theta"] || 10000.0).to_f, rope_scaling: cfg["rope_scaling"], head_dim: cfg["head_dim"], eps: (cfg["rms_norm_eps"] || 1e-5).to_f, device_id: device_id ) load(model, dir, device_id: device_id) model end |
.load(model, dir, device_id: 0) ⇒ Integer
Load weights from dir/model.safetensors into an existing ModernModel.
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/nnw/ai/llama_loader.rb', line 43 def load(model, dir, device_id: 0) tensors = Safetensors.load(File.join(dir, "model.safetensors"), device_id: device_id) = tensors["model.embed_tokens.weight"] raise "LlamaLoader: missing model.embed_tokens.weight" unless lm_head_src = tensors["lm_head.weight"] # nil when embeddings are tied count = 0 model.named_parameters.each do |name, param| src = if name == "head.weight" lm_head_src || # untied lm_head, else tied to embeddings else tensors[hf_name(name)] end raise "LlamaLoader: no source weight for param #{name} (#{hf_name(name)})" unless src unless param.shape == src.shape raise "LlamaLoader: shape mismatch for #{name}: model #{param.shape} vs file #{src.shape}" end dequant_into!(src.data, param.data) count += 1 end Ignis.synchronize count end |