lex-llm-mlx
LegionIO LLM provider extension for MLX-backed OpenAI-compatible servers on Apple Silicon.
This gem lives under Legion::Extensions::Llm::Mlx and depends on lex-llm >= 0.4.3 for shared provider-neutral routing, response normalization, fleet envelopes, fleet responder execution, and schema primitives.
Load it with require 'legion/extensions/llm/mlx'.
What It Provides
Legion::Extensions::Llm::Mlx::Provider, exposed tolegion-llmas the:mlxprovider family.- OpenAI-compatible chat, streaming, model listing, and embeddings endpoint wrappers.
- Heuristic chat, embedding, and vision capability mapping for discovered local models.
- Local-first defaults for MLX servers running on Apple Silicon hosts.
- Best-effort
llm.registryevent publishing through sharedlex-llmregistry helpers when transport is available. - Provider-owned fleet request actor and runner backed by
lex-llm. - Shared Legion settings, JSON, and logging dependencies with full
Legion::Logging::Helperintegration.
Architecture
Legion::Extensions::Llm::Mlx
Mlx # Extension namespace, discovery metadata, default settings
Provider # Health, readiness, model listing, OpenAI-compatible adapter
Actor::FleetWorker # Subscription actor enabled by provider instance fleet settings
Runners::FleetWorker # Delegates fleet execution to Legion::Extensions::Llm::Fleet::ProviderResponder
(shared from lex-llm)
RegistryPublisher # Async llm.registry event publishing
RegistryEventBuilder # Sanitized registry envelope construction
The extension no longer writes provider adapters into the registry at require time. Loaded provider discovery metadata is consumed by legion-llm, which owns adapter creation and registry writes.
Default Settings
Legion::Extensions::Llm::Mlx.default_settings
Defaults target http://localhost:8000, mark the default instance as :local, allow one concurrent local request, and keep fleet participation disabled until a host opts in through extension settings.
Configuration
The provider accepts the shared lex-llm configuration options:
Legion::Extensions::Llm.configure do |config|
config.mlx_api_base = 'http://localhost:8000'
config.mlx_api_key = ENV['MLX_API_KEY']
end
mlx_api_key is optional because most local MLX servers run without authentication. Set it when a proxy or hosted MLX gateway requires bearer authentication.
Provider discovery also reads named instances from extensions.llm.mlx.instances. Generic keys are normalized for the MLX provider:
extensions:
llm:
mlx:
instances:
local:
base_url: http://localhost:8000
api_key: null
fleet:
enabled: false
respond_to_requests: false
capabilities:
- chat
- stream_chat
- embed
Accepted instance URL keys are base_url, api_base, endpoint, or mlx_api_base. A trailing /v1 is stripped because the shared OpenAI-compatible adapter appends endpoint paths itself.
Fleet Responder
Provider instances can opt in to consuming Legion LLM fleet requests. The provider-owned fleet actor only starts when at least one configured instance enables respond_to_requests.
extensions:
llm:
mlx:
instances:
local:
base_url: http://localhost:8000
fleet:
enabled: true
respond_to_requests: true
capabilities:
- chat
- stream_chat
- embed
Endpoint Helpers
completion_urlandstream_url:/v1/chat/completionsmodels_url:/v1/modelsembedding_url:/v1/embeddingshealth_url:/health
The provider uses the shared Legion::Extensions::Llm::Provider::OpenAICompatible adapter so Legion routing can treat MLX, vLLM, OpenAI, and other compatible servers consistently while preserving provider-specific settings and health behavior.
Registry Event Publishing
When Legion::Transport and lex-llm routing are available, the provider publishes best-effort events to the llm.registry topic exchange:
- Readiness events — published asynchronously when
readiness(live: true)is called. - Model availability events — published asynchronously after
list_modelsdiscovers models.
Publishing is fire-and-forget in background threads; failures never block the provider.
Failure Modes
readiness(live: true)calls the MLX/healthendpoint and publishes readiness metadata only when the live check succeeds.list_modelsexpects an OpenAI-compatible/v1/modelsresponse and publishes discovered model availability through the shared registry publisher.- Fleet request handling is disabled unless at least one discovered instance opts in with
fleet.respond_to_requests: true. - Local instance discovery checks
localhost:8080; explicitly configured instances can point at any OpenAI-compatible MLX endpoint.
Dependencies
| Gem | Required | Purpose |
|---|---|---|
legion-json (>= 1.2.1) |
Yes | JSON serialization |
legion-logging (>= 1.3.2) |
Yes | Structured logging via Helper |
legion-settings (>= 1.3.14) |
Yes | Configuration |
lex-llm (>= 0.4.3) |
Yes | Shared provider base, response normalization, routing, fleet envelopes, and fleet responder execution |
legion-transport (>= 1.4.14) |
Yes | AMQP subscriptions and replies |
Development
bundle install
bundle exec rspec --format json --out tmp/rspec_results.json --format progress --out tmp/rspec_progress.txt
bundle exec rubocop -A