lex-llm-vertex
Google Cloud Vertex AI provider extension for Legion::Extensions::Llm.
This gem adds a hosted Vertex AI provider surface for Legion LLM routing without depending on the old legion-llm gem. It keeps discovery offline by default, preserves full Vertex publisher model resource names for routing, and exposes project/location instance metadata for multi-region provider fleets. It requires lex-llm >= 0.1.5 for the shared model offering, alias, readiness, and fleet lane contract.
Install
gem 'lex-llm-vertex'
Configuration
The provider registers the :vertex provider family with Legion::Extensions::Llm::Provider.
require 'legion/extensions/llm/vertex'
Legion::Extensions::Llm.configure do |config|
config.vertex_project = ENV['GOOGLE_CLOUD_PROJECT']
config.vertex_location = ENV.fetch('VERTEX_LOCATION', 'us-central1')
config.vertex_access_token = ENV['VERTEX_ACCESS_TOKEN']
end
vertex_access_token is optional for local routing metadata and tests. For live calls, provide a Google Cloud access token through configuration or use Application Default Credentials in the process that owns HTTP authentication.
Default settings expose env:// references and keep live discovery disabled:
Legion::Extensions::Llm::Vertex.default_settings
Provider Surface
provider = Legion::Extensions::Llm::Vertex::Provider.new(Legion::Extensions::Llm.config)
provider.discover_offerings(live: false)
provider.offering_for(model: 'gemini-2.5-flash')
provider.health(live: false)
provider.chat(, model: model)
provider.stream(, model: model) { |chunk| chunk.content }
provider.('hello', model: 'gemini-embedding-001')
provider.count_tokens(, model: model)
discover_offerings(live: false) returns a conservative static catalog for routing defaults and unit tests. discover_offerings(live: true) calls the Vertex publisher models listing endpoint and maps returned model data into Legion::Extensions::Llm::Routing::ModelOffering records.
Static Model Catalog
| Model | Alias | Publisher | Family | API Mode |
|---|---|---|---|---|
| gemini-2.5-flash | gemini-flash | gemini | generateContent | |
| gemini-2.5-pro | gemini-pro | gemini | generateContent | |
| gemini-embedding-001 | gemini-embedding | gemini | predict (embedding) | |
| text-embedding-005 | text-embedding | gemini | predict (embedding) | |
| claude-sonnet-4-5 | claude-sonnet | anthropic | anthropic | rawPredict |
| mistral-medium-3 | mistral-medium | mistralai | mistral | rawPredict |
| llama-4-maverick | llama-4-maverick | meta | meta | rawPredict |
Model Offerings
Every offering uses:
provider_family: :vertextransport: :http- the full Vertex publisher model resource name as
model metadata[:model_family]inferred from the publisher/model or accepted from the callermetadata[:project]andmetadata[:location]copied from the provider instance
Known aliases are intentionally small and configurable. For example, gemini-flash resolves to gemini-2.5-flash, while the offering preserves projects/{project}/locations/{location}/publishers/google/models/gemini-2.5-flash.
Registry Events
When transport is available, the RegistryPublisher publishes best-effort readiness and offering availability events to the llm.registry topic exchange using lex-llm registry envelopes. Events are published asynchronously in background threads and never block the caller.
File Map
| Path | Purpose |
|---|---|
lib/legion/extensions/llm/vertex.rb |
Namespace module, default settings, provider registration |
lib/legion/extensions/llm/vertex/provider.rb |
Vertex AI provider: chat, stream, embed, count_tokens, health, discovery |
lib/legion/extensions/llm/vertex/registry_publisher.rb |
Async best-effort llm.registry event publisher |
lib/legion/extensions/llm/vertex/registry_event_builder.rb |
Builds sanitized registry event envelopes |
lib/legion/extensions/llm/vertex/version.rb |
VERSION constant |
lib/legion/extensions/llm/vertex/transport/exchanges/llm_registry.rb |
llm.registry topic exchange definition |
lib/legion/extensions/llm/vertex/transport/messages/registry_event.rb |
Transport message for registry events |
Observability
All modules and classes use Legion::Logging::Helper for structured logging:
- Info-level logging on key provider actions:
chat,stream,embed,count_tokens,discover_offerings,health, and registry publish operations - Every rescue block calls
handle_exception(e, level:, handled:, operation:)with dot-separated operation names (e.g.vertex.provider.health,vertex.registry.publish_event) - Level conventions:
:warnfor recoverable failures,:errorfor unexpected errors,:debugfor expected/best-effort failures (transport unavailable, etc.)
API Contract
The implementation is intentionally limited to Vertex AI REST surfaces documented by Google Cloud:
generateContentandstreamGenerateContentfor Gemini publisher modelscountTokensfor Gemini-style publisher modelspredictfor documented text embedding modelsrawPredictandstreamRawPredictendpoint builders for partner publisher models such as Mistral, Anthropic, and Meta
Provider-specific request bodies are not guessed. Partner raw-predict chat requests use the message shape documented for those partner model endpoints; embeddings are only implemented for documented Vertex text embedding models.
Development
bundle install
bundle exec rspec # 0 failures
bundle exec rubocop -A # auto-fix
bundle exec rubocop # lint check
License
Apache-2.0