kiribi-gemma4_e2b (DEPRECATED)
This addon is deprecated. Use the kiribi gem directly (>= 0.1.0), which now bundles this model:
gem install kiribi
require "kiribi"
Kiribi.download("gemma4-e2b") # or: kiribi download gemma4-e2b
model = Kiribi.load("gemma4-e2b")
model.generate("Hello!")
The kiribi-gemma4_e2b gem is pinned to kiribi < 0.1.0 and will be removed from RubyGems in a future release.
Google Gemma 4 E2B (2.3B parameters) multimodal model for text, image, and audio.
Based on onnx-community/gemma-4-E2B-it-ONNX (ONNX format, FP32).
!!CAUTION!! : This gem downloads ~22GB of model files from HuggingFace during installation. Be mindful of disk space and network bandwidth.
Installation
gem install kiribi-gemma4_e2b
Model files (~22GB) are downloaded from HuggingFace during installation.
Requirements
- Ruby >= 3.4.0
ffmpeg/ffprobe(for image and audio preprocessing by the caller)
Usage
Text generation
require "kiribi/gemma4/e2b"
model = Kiribi::Gemma4::E2B.load
model.generate("Hello!")
Multi-turn chat
model.chat([
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is Ruby?" },
{ role: "model", content: "Ruby is a dynamic programming language." },
{ role: "user", content: "Who created it?" },
])
Image understanding
Preprocessing is the caller's responsibility. Use ffmpeg/ffprobe to obtain raw RGB pixels:
require "kiribi/gemma4/e2b"
model = Kiribi::Gemma4::E2B.load
encoder = model.load_vision_encoder # loads vision_encoder.onnx
# 1. Get original dimensions
info = IO.popen(["ffprobe", "-v", "error", "-select_streams", "v:0", "-show_entries", "stream=width,height", "-of", "csv=p=0", "photo.png"], &:read)
original_width, original_height = info.strip.split(",").map(&:to_i)
# 2. Compute the size to resize to
input_width, input_height = encoder.input_size_of(original_width, original_height)
# 3. Resize (caller's choice of tool)
blob = IO.popen(["ffmpeg", "-i", "photo.png", "-vf", "scale=#{input_width}:#{input_height}:flags=bicubic", "-f", "rawvideo", "-pix_fmt", "rgb24", "-v", "error", "-"], "rb", &:read)
# 4. Encode
features = encoder.encode(blob, input_width, input_height)
model.chat([
{ role: "user", content: [
{ type: "image", features: },
{ type: "text", text: "What is in this image?" },
] },
])
Audio transcription
require "kiribi/gemma4/e2b"
model = Kiribi::Gemma4::E2B.load
encoder = model.load_audio_encoder # loads audio_encoder.onnx
# 1. Decode to 16kHz mono f32le PCM
pcm = IO.popen(["ffmpeg", "-i", "audio.mp3", "-f", "f32le", "-acodec", "pcm_f32le", "-ar", "16000", "-ac", "1", "-", err: "/dev/null"], "rb", &:read)
# 2. Encode
features = encoder.encode(pcm)
model.chat([
{ role: "user", content: [
{ type: "audio", features: },
{ type: "text", text: "Transcribe the following speech segment in its original language." },
] },
])
License
This gem is available as open source under the terms of the MIT License.
The model weights are licensed under Apache License 2.0 by Google.