gte
gte is a Ruby gem with a Rust extension for fast text embeddings with ONNX Runtime.
Inspired by https://github.com/fbilhaut/gte-rs
Quick Start
require "gte"
model = GTE.config(ENV.fetch("GTE_MODEL_DIR"))
# String input => GTE::Tensor (1 row)
tensor = model.("query: hello world")
vector = tensor.row(0)
# [] with string => Array<Float> (single vector)
single = model["query: nearest coffee shop"]
# [] with array => GTE::Tensor (batch)
batch = model[["query: hello", "query: world"]]
Embedding Config (GTE.config)
GTE.config(model_dir) builds (and caches) a GTE::Model.
default_model = GTE.config(ENV.fetch("GTE_MODEL_DIR"))
raw_model = GTE.config(ENV.fetch("GTE_MODEL_DIR")) do |config|
config.with(normalize: false)
end
full_throttle = GTE.config(ENV.fetch("GTE_MODEL_DIR")) do |config|
config.with(threads: 0)
end
custom = GTE.config(ENV.fetch("GTE_MODEL_DIR")) do |config|
config.with(
output_tensor: "last_hidden_state",
max_length: 256,
optimization_level: 3
)
end
Config fields and defaults:
model_dir: absolute path to model directorythreads:3(set0for ONNX Runtime full-throttle threadpool)optimization_level:3model_name:nilnormalize:true(L2 normalization at Ruby-facing API)output_tensor:nil(auto-select output tensor)max_length:nil(uses tokenizer/model defaults)execution_providers:nil(falls back toGTE_EXECUTION_PROVIDERS/ CPU default)
Notes:
- Return a
Config::Textfrom the block (for example,config.with(...)). - Model instances are cached by full config key; different config values create different cached instances.
Low-level embedder setup (without model cache):
= GTE::Embedder.config(ENV.fetch("GTE_MODEL_DIR")) do |config|
config.with(threads: 0, execution_providers: "cpu")
end
Reranker
Use GTE::Reranker.config(model_dir) for cross-encoder reranking.
reranker = GTE::Reranker.config(ENV.fetch("GTE_RERANK_DIR")) do |config|
config.with(sigmoid: true, threads: 0)
end
query = "how to train a neural network?"
candidates = [
"Backpropagation and gradient descent are core techniques.",
"This recipe uses flour and eggs."
]
# Raw scores aligned with input order
scores = reranker.score(query, candidates)
# => [0.93, 0.07]
# Ranked output sorted by score desc
ranked = reranker.rerank(query: query, candidates: candidates)
# => [
# { index: 0, score: 0.93, text: "Backpropagation and gradient descent are core techniques." },
# { index: 1, score: 0.07, text: "This recipe uses flour and eggs." }
# ]
Reranker config fields and defaults:
model_dir: absolute path to model directorythreads:3optimization_level:3model_name:nilsigmoid:false(settrueif you want bounded [0,1] style scores)output_tensor:nilmax_length:nilexecution_providers:nil
Runtime + Result Examples
Process-local reuse (recommended for Puma/web servers):
EMBEDDER = GTE.config(ENV.fetch("GTE_MODEL_DIR"))
def (text)
EMBEDDER[text] # Array<Float>
end
Model Directory
A model directory must include tokenizer.json and one ONNX model, resolved in this order:
onnx/text_model.onnxtext_model.onnxonnx/model.onnxmodel.onnx
Input policy is text-only. Graphs requiring unsupported multimodal inputs (such as pixel_values) are intentionally rejected.
Execution Providers
Default behavior is CPU fallback via ONNX Runtime's default provider (no explicit provider registration).
Configure providers with GTE_EXECUTION_PROVIDERS (comma-separated, case-insensitive).
Supported values:
cpuornone: CPU fallback (skip explicit provider registration)xnnpackcoreml
Examples:
export GTE_EXECUTION_PROVIDERS=cpu
export GTE_EXECUTION_PROVIDERS=xnnpack,coreml
Ruby per-instance override (takes precedence over GTE_EXECUTION_PROVIDERS):
model = GTE.config(ENV.fetch("GTE_MODEL_DIR")) do |config|
config.with(execution_providers: "cpu")
end
Development
Run commands inside nix develop via Make targets:
make setup
make compile
make test
make lint
make ci
Benchmark
The repo includes two benchmark paths:
make bench
nix develop -c bundle exec rake bench:pure_compare
nix develop -c bundle exec rake bench:matrix_sweep
nix develop -c bundle exec ruby bench/memory_probe.rb --compare-pure
To run benchmark + append a RUNS.md entry + enforce goal checks:
make bench-record
bench/runs_ledger.rb check is goal-focused by default:
- Enforces goal metric (
response_time_p95ratio threshold). - Does not require current-version coverage in
RUNS.mdunless explicitly enabled.