lex-llm-vllm
LegionIO LLM provider extension for vLLM.
This gem lives under Legion::Extensions::Llm::Vllm and depends on lex-llm for shared provider-neutral routing, fleet, and schema primitives.
What It Provides
LexLLM::Providerregistration as:vllm- shared
LexLLM::Provider::OpenAICompatiblerequest and response handling - chat requests through
POST /v1/chat/completions - streaming chat support
- model discovery through
GET /v1/models - embeddings through
POST /v1/embeddings - vLLM management helpers for
/health,/version,/reset_prefix_cache,/reset_mm_cache,/sleep, and/wake_up - shared fleet/default settings via
Legion::Extensions::Llm.provider_settings
Defaults
Legion::Extensions::Llm::Vllm.default_settings
# {
# provider_family: :vllm,
# instances: {
# default: {
# endpoint: "http://localhost:8000",
# tier: :private,
# transport: :http,
# usage: { inference: true, embedding: true },
# limits: { concurrency: 8 }
# }
# }
# }
Configuration
LexLLM.configure do |config|
config.vllm_api_base = "http://localhost:8000"
config.vllm_api_key = ENV["VLLM_API_KEY"]
config.default_model = "meta-llama/Llama-3.1-8B-Instruct"
config. = "BAAI/bge-base-en-v1.5"
end
vLLM's OpenAI-compatible server supports the chat completions, models, and embeddings APIs when the served model and task support them. Chat requests require a model with a chat template; embedding requests require an embedding-capable served model.