lex-llm-azure-foundry
LegionIO LLM provider extension for Azure AI Foundry Models and Azure OpenAI hosted deployments.
This gem lives under Legion::Extensions::Llm::AzureFoundry and depends on lex-llm for shared provider-neutral routing, fleet, model-offering, and schema primitives.
Load it with require 'legion/extensions/llm/azure_foundry'.
What It Provides
Legion::Extensions::Llm::Providerregistration as:azure_foundry- Azure AI Foundry model inference chat completions through
POST /models/chat/completions?api-version=... - Azure AI Foundry model inference embeddings through
POST /models/embeddings?api-version=... - Azure AI Foundry model info health check through
GET /models/info?api-version=...whenlive: true - Azure OpenAI v1-compatible endpoint support through
/openai/v1/chat/completionsand/openai/v1/embeddings - deployment-name-preserving routing offerings for hosted Azure deployments
- explicit
model_familyandcanonical_model_aliasmetadata for deployments whose base model cannot be proven from Azure metadata - offline-first discovery from configured deployments
- shared OpenAI-compatible request and response mapping via
Legion::Extensions::Llm::Provider::OpenAICompatible - conservative token-counting metadata when no portable Azure token-counting REST endpoint is configured
API Contract
The implementation follows Microsoft Learn REST documentation for Azure AI Foundry Models:
- Azure AI Foundry model inference endpoints use deployment names as the request
model. - The model inference endpoint supports chat completions and embeddings.
- The documented model-info endpoint is used only for explicit live health checks.
- Azure deployment metadata is not assumed to reliably prove base model family or version, so routing metadata should be configured explicitly.
Defaults
Legion::Extensions::Llm::AzureFoundry.default_settings
# {
# provider_family: :azure_foundry,
# discovery: { enabled: true, live: false },
# instances: {
# default: {
# endpoint: "https://<resource>.services.ai.azure.com",
# api_version: "2024-05-01-preview",
# surface: :model_inference,
# tier: :frontier,
# transport: :http,
# credentials: {
# api_key: "env://AZURE_INFERENCE_CREDENTIAL",
# bearer_token: "env://AZURE_FOUNDRY_BEARER_TOKEN",
# entra_scope: "https://cognitiveservices.azure.com/.default"
# },
# deployments: [],
# usage: { inference: true, embedding: true, token_counting: false },
# limits: { concurrency: 4 }
# }
# }
# }
Configuration
Legion::Extensions::Llm.configure do |config|
config.azure_foundry_endpoint = ENV.fetch("AZURE_FOUNDRY_ENDPOINT")
config.azure_foundry_api_key = ENV["AZURE_INFERENCE_CREDENTIAL"]
config.azure_foundry_bearer_token = ENV["AZURE_FOUNDRY_BEARER_TOKEN"]
config.azure_foundry_api_version = "2024-05-01-preview"
config.azure_foundry_surface = :model_inference
config.azure_foundry_deployments = [
{
deployment: "gpt-4o-prod",
model_family: :openai,
canonical_model_alias: "gpt-4o",
usage_type: :inference
},
{
deployment: "mistral-large-prod",
model_family: :mistral,
canonical_model_alias: "mistral-large",
usage_type: :inference
},
{
deployment: "embedding-prod",
model_family: :openai,
canonical_model_alias: "text-embedding-3-small",
usage_type: :embedding
}
]
end
Use config.azure_foundry_surface = :openai_v1 when the target endpoint should be treated as the OpenAI v1-compatible Azure route. The provider appends /openai/v1 when the configured endpoint does not already include it.
Provider Methods
provider = Legion::Extensions::Llm::AzureFoundry.provider_class.new(Legion::Extensions::Llm.config)
provider.discover_offerings(live: false)
provider.offering_for(model: "gpt-4o-prod", model_family: :openai, canonical_model_alias: "gpt-4o")
provider.health(live: false)
provider.chat(, model: "gpt-4o-prod")
provider.stream(, model: "gpt-4o-prod") { |chunk| puts chunk.content }
provider.(["hello"], model: "embedding-prod")
provider.count_tokens(, model: "gpt-4o-prod")
discover_offerings(live: false) never calls Azure. It maps configured deployments into Legion::Extensions::Llm::Routing::ModelOffering values with provider_family: :azure_foundry.
health(live: true) calls the documented model-info endpoint for the configured model-inference surface. Keep live: false for startup paths and tests that must not require Azure.
count_tokens returns a structured unsupported result by default because the Microsoft REST contract used here does not define a portable token-counting endpoint across Azure AI Foundry deployments.
Routing Metadata
Azure deployments are aliases. A deployment name can hide provider, model, and version details, so this extension preserves the deployment name as model and treats canonical_model_alias and model_family as routing metadata.
Supported model_family values are intentionally open-ended symbols, including:
:openai:mistral:meta:xai:anthropic:microsoft
When model_family or canonical_model_alias is missing, offerings include requires_explicit_model_metadata: true.