lex-llm-azure-foundry

LegionIO LLM provider extension for Azure AI Foundry Models and Azure OpenAI hosted deployments.

This gem lives under Legion::Extensions::Llm::AzureFoundry and depends on lex-llm for shared provider-neutral routing, fleet, model-offering, and schema primitives.

Load it with require 'legion/extensions/llm/azure_foundry'.

What It Provides

  • Legion::Extensions::Llm::Provider registration as :azure_foundry
  • Azure AI Foundry model inference chat completions through POST /models/chat/completions?api-version=...
  • Azure AI Foundry model inference embeddings through POST /models/embeddings?api-version=...
  • Azure AI Foundry model info health check through GET /models/info?api-version=... when live: true
  • Azure OpenAI v1-compatible endpoint support through /openai/v1/chat/completions and /openai/v1/embeddings
  • deployment-name-preserving routing offerings for hosted Azure deployments
  • explicit model_family and canonical_model_alias metadata for deployments whose base model cannot be proven from Azure metadata
  • offline-first discovery from configured deployments
  • shared OpenAI-compatible request and response mapping via Legion::Extensions::Llm::Provider::OpenAICompatible
  • conservative token-counting metadata when no portable Azure token-counting REST endpoint is configured

API Contract

The implementation follows Microsoft Learn REST documentation for Azure AI Foundry Models:

  • Azure AI Foundry model inference endpoints use deployment names as the request model.
  • The model inference endpoint supports chat completions and embeddings.
  • The documented model-info endpoint is used only for explicit live health checks.
  • Azure deployment metadata is not assumed to reliably prove base model family or version, so routing metadata should be configured explicitly.

Defaults

Legion::Extensions::Llm::AzureFoundry.default_settings
# {
#   provider_family: :azure_foundry,
#   discovery: { enabled: true, live: false },
#   instances: {
#     default: {
#       endpoint: "https://<resource>.services.ai.azure.com",
#       api_version: "2024-05-01-preview",
#       surface: :model_inference,
#       tier: :frontier,
#       transport: :http,
#       credentials: {
#         api_key: "env://AZURE_INFERENCE_CREDENTIAL",
#         bearer_token: "env://AZURE_FOUNDRY_BEARER_TOKEN",
#         entra_scope: "https://cognitiveservices.azure.com/.default"
#       },
#       deployments: [],
#       usage: { inference: true, embedding: true, token_counting: false },
#       limits: { concurrency: 4 }
#     }
#   }
# }

Configuration

Legion::Extensions::Llm.configure do |config|
  config.azure_foundry_endpoint = ENV.fetch("AZURE_FOUNDRY_ENDPOINT")
  config.azure_foundry_api_key = ENV["AZURE_INFERENCE_CREDENTIAL"]
  config.azure_foundry_bearer_token = ENV["AZURE_FOUNDRY_BEARER_TOKEN"]
  config.azure_foundry_api_version = "2024-05-01-preview"
  config.azure_foundry_surface = :model_inference
  config.azure_foundry_deployments = [
    {
      deployment: "gpt-4o-prod",
      model_family: :openai,
      canonical_model_alias: "gpt-4o",
      usage_type: :inference
    },
    {
      deployment: "mistral-large-prod",
      model_family: :mistral,
      canonical_model_alias: "mistral-large",
      usage_type: :inference
    },
    {
      deployment: "embedding-prod",
      model_family: :openai,
      canonical_model_alias: "text-embedding-3-small",
      usage_type: :embedding
    }
  ]
end

Use config.azure_foundry_surface = :openai_v1 when the target endpoint should be treated as the OpenAI v1-compatible Azure route. The provider appends /openai/v1 when the configured endpoint does not already include it.

Provider Methods

provider = Legion::Extensions::Llm::AzureFoundry.provider_class.new(Legion::Extensions::Llm.config)

provider.discover_offerings(live: false)
provider.offering_for(model: "gpt-4o-prod", model_family: :openai, canonical_model_alias: "gpt-4o")
provider.health(live: false)
provider.chat(messages, model: "gpt-4o-prod")
provider.stream(messages, model: "gpt-4o-prod") { |chunk| puts chunk.content }
provider.embed(["hello"], model: "embedding-prod")
provider.count_tokens(messages, model: "gpt-4o-prod")

discover_offerings(live: false) never calls Azure. It maps configured deployments into Legion::Extensions::Llm::Routing::ModelOffering values with provider_family: :azure_foundry.

health(live: true) calls the documented model-info endpoint for the configured model-inference surface. Keep live: false for startup paths and tests that must not require Azure.

count_tokens returns a structured unsupported result by default because the Microsoft REST contract used here does not define a portable token-counting endpoint across Azure AI Foundry deployments.

Routing Metadata

Azure deployments are aliases. A deployment name can hide provider, model, and version details, so this extension preserves the deployment name as model and treats canonical_model_alias and model_family as routing metadata.

Supported model_family values are intentionally open-ended symbols, including:

  • :openai
  • :mistral
  • :meta
  • :xai
  • :anthropic
  • :microsoft

When model_family or canonical_model_alias is missing, offerings include requires_explicit_model_metadata: true.