lex-ollama
Ollama integration for LegionIO. Connects LegionIO to a local Ollama LLM server for text generation, chat completions, embeddings, and model management.
Installation
gem install lex-ollama
Functions
Completions
generate- Generate a text completion (POST /api/generate)generate_stream- Stream a text completion with per-chunk callbacks
Chat
chat- Generate a chat completion with message history and tool support (POST /api/chat)chat_stream- Stream a chat completion with per-chunk callbacks
Models
create_model- Create a model from another model, GGUF, or safetensors (POST /api/create)list_models- List locally available models (GET /api/tags)show_model- Show model details, template, parameters, license (POST /api/show)copy_model- Copy a model to a new name (POST /api/copy)delete_model- Delete a model and its data (DELETE /api/delete)pull_model- Download a model from the Ollama library (POST /api/pull)push_model- Upload a model to the Ollama library (POST /api/push)list_running- List models currently loaded in memory (GET /api/ps)
Embeddings
embed- Generate embeddings from a model (POST /api/embed)
Blobs
check_blob- Check if a blob exists on the server (HEAD /api/blobs/:digest)push_blob- Upload a binary blob to the server (POST /api/blobs/:digest)
S3 Model Distribution
list_s3_models- List models available in an S3 mirrorimport_from_s3- Download model from S3 directly to Ollama's filesystem (works before Ollama starts)sync_from_s3- Download model from S3, push blobs through Ollama's API, write manifest to filesystemimport_default_models- Import a list of models from S3 (fleet provisioning)
Version
server_version- Retrieve the Ollama server version (GET /api/version)
Fleet Queue Subscription
handle_request- Dispatch inbound fleet AMQP messages to the appropriate runner (chat/embed/generate)
When Legion::Extensions::Core is present, lex-ollama subscribes to model-scoped queues on the
llm.request topic exchange, accepting routed LLM inference work from other Legion fleet members.
Each configured (type, model) pair gets its own auto-delete queue with routing key
llm.request.ollama.<type>.<model>. Multiple nodes serving the same model compete fairly
via RabbitMQ round-robin with consumer priority.
legion:
ollama:
host: "http://localhost:11434"
fleet:
consumer_priority: 10 # H100: 10, Mac Studio: 5, MacBook: 1
subscriptions:
- type: embed
model: nomic-embed-text
- type: chat
model: "qwen3.5:27b"
Fleet messages use the wire protocol defined in legion-llm: typed AMQP messages
(llm.fleet.request / llm.fleet.response / llm.fleet.error) with message_context
propagation for end-to-end tracing.
Without Legion::Extensions::Core, the gem works as a pure HTTP client library with no
AMQP dependency.
Standalone Client
client = Legion::Extensions::Ollama::Client.new
# or with custom host
client = Legion::Extensions::Ollama::Client.new(host: 'http://remote:11434')
# Chat
result = client.chat(model: 'llama3.2', messages: [{ role: 'user', content: 'Hello!' }])
# Generate
result = client.generate(model: 'llama3.2', prompt: 'Why is the sky blue?')
# Embeddings
result = client.(model: 'all-minilm', input: 'Some text to embed')
# List models
result = client.list_models
# Streaming generate
client.generate_stream(model: 'llama3.2', prompt: 'Tell me a story') do |event|
case event[:type]
when :delta then print event[:text]
when :done then puts "\nDone!"
end
end
# Streaming chat
client.chat_stream(model: 'llama3.2', messages: [{ role: 'user', content: 'Hello!' }]) do |event|
print event[:text] if event[:type] == :delta
end
S3 Model Distribution
Pull models from an internal S3 mirror instead of the public Ollama registry:
client = Legion::Extensions::Ollama::Client.new
# List available models in S3
client.list_s3_models(bucket: 'legion', endpoint: 'https://s3.example.internal')
# Import directly to filesystem (works without Ollama running)
client.import_from_s3(model: 'llama3:latest', bucket: 'legion',
endpoint: 'https://s3.example.internal')
# Push through Ollama API (requires Ollama running)
client.sync_from_s3(model: 'llama3:latest', bucket: 'legion',
endpoint: 'https://s3.example.internal')
# Provision fleet with default models
client.import_default_models(
default_models: %w[llama3:latest nomic-embed-text:latest],
bucket: 'legion',
endpoint: 'https://s3.example.internal'
)
S3 operations use lex-s3. The S3 bucket should mirror the Ollama models directory structure (manifests/ and blobs/ under the configured prefix).
All API calls include automatic retry with exponential backoff on connection failures and timeouts.
Generate and chat responses include standardized usage: data:
result = client.generate(model: 'llama3.2', prompt: 'Hello')
result[:usage] # => { input_tokens: 1, output_tokens: 5, total_duration: ..., ... }
Requirements
Version
0.3.2
License
MIT