PatientLLM
Integrate LLM APIs with your Ruby backend applications without blocking threads. This gem uses asynchronous HTTP requests to call LLM providers and handles the response via callbacks. It supports multiple API formats natively via PromptBuilder serializers:
- OpenAI Chat Completions (
:chat_completion) -- for OpenAI and compatible providers - OpenAI Responses (
:open_responses) -- for the newer OpenAI Responses API - Anthropic Messages (
:messages) -- for the Anthropic Claude API - Bedrock Converse (
:converse) -- for AWS Bedrock Converse API - Gemini (
:gemini) -- for the Google Gemini API
LLM API calls can take a long time to complete. With traditional synchronous HTTP clients, these requests tie up application threads while waiting for responses. This gem solves that problem by using async HTTP via PatientHttp, freeing up your threads to do other work while waiting for the LLM provider to respond.
Prerequisites
This gem delegates actual HTTP dispatch to patient_http, which requires a registered request handler before any PatientLLM.ask call will succeed. In a normal app you get this handler by adding one of the job-system integrations:
Without a handler, PatientLLM.ask raises RuntimeError: No request handler registered.
Usage
Configuration
Register your LLM providers with their API base URLs and authentication headers:
PatientLLM.configure do |config|
config.provider :openai,
url: "https://api.openai.com",
headers: {"Authorization" => "Bearer #{ENV["OPENAI_API_KEY"]}"}
config.provider :anthropic,
url: "https://api.anthropic.com",
headers: {"x-api-key" => ENV["ANTHROPIC_API_KEY"]},
serializer: :messages
end
[!NOTE] Authentication headers configured on the provider are re-attached to every request at dispatch time and are persisted in the asynchronous job payload.
You should set up encryption for you job payloads to prevent leaking credentials. See the documentation for patient_http-sidekiq or patient_http-solid_queue for details.
Creating a Callback Class
Create a callback class with on_complete and on_error methods. Callbacks receive
keyword arguments, and you only declare the ones you need — the dispatcher inspects your
method signature and passes just those values (or everything if you declare **kwargs):
class LLMCallback
def on_complete(session:, provider:, llm_response:, callback_args:, http_response:, request_id:)
# session - the PromptBuilder::Session with the response already added
# provider - the provider name (String)
# llm_response - a PromptBuilder::Response with the assistant's response
# callback_args - a PatientHttp::CallbackArgs containing data you passed in the `ask` call
# http_response - the raw PatientHttp::Response
# request_id - the original request id (stable across tool-call iterations)
# Access the response content
puts llm_response.text
puts "Tokens: #{llm_response.usage.input_tokens} in / #{llm_response.usage.output_tokens} out"
puts "Duration: #{http_response.duration}s"
# Save the session state for future turns (response is already in the session)
save_session_state(callback_args[:user_id], session.to_h)
end
def on_error(session:, provider:, callback_args:, error:, http_response:, request_id:)
# error is a PatientHttp::RequestError, ClientError (HTTP 4xx),
# or ServerError (HTTP 5xx). All respond to:
# error.error_type - :timeout, :connection, :ssl, :http_error, etc.
# error.message - human-readable message
# error.error_class - the original exception class (for RequestError)
# error.request_id
# http_response is the raw PatientHttp::Response for HTTP errors, or nil for
# transport errors (timeouts, connection failures).
log_error(error.error_type, error.)
end
end
Callback keyword parameters
Each callback may declare any subset of the keywords below, in any order. Declaring
**kwargs receives them all. PatientLLM.ask validates your callback's signatures up
front and raises an ArgumentError if a method uses an unsupported name, a positional
parameter, or omits the required keyword.
| Callback | Supported keywords | Required |
|---|---|---|
on_complete |
session, provider, llm_response, callback_args, http_response, request_id |
llm_response |
on_tool_use (optional) |
session, provider, llm_response, callback_args, http_response, request_id |
llm_response |
on_error |
session, provider, callback_args, error, http_response, request_id |
error |
For example, a callback that only cares about the response text can be as small as:
class LLMCallback
def on_complete(llm_response:)
puts llm_response.text
end
def on_error(error:)
log_error(error.error_type, error.)
end
end
Making LLM Requests
Create a PromptBuilder::Session and call PatientLLM.ask to make an async request:
session = PromptBuilder::Session.new(model: "gpt-4o")
session.instructions = "You are a helpful assistant."
session.user("What is the capital of France?")
PatientLLM.ask(session, provider: :openai, callback: LLMCallback)
You can pass custom data to your callback using callback_args:
PatientLLM.ask(session, provider: :openai, callback: LLMCallback, callback_args: {
user_id: current_user.id,
conversation_id: conversation.id
})
The request is sent asynchronously. When the LLM responds, your callback's on_complete method will be called with the result.
Session Configuration Options
PromptBuilder::Session supports various configuration:
session = PromptBuilder::Session.new(model: "gpt-5.4")
# Set system instructions
session.instructions = "You are a helpful assistant."
# Set temperature
session.temperature = 0.7
# Enable reasoning for supported models (OpenAI o1/o3 family)
session.reasoning = {effort: "high"}
# Set a JSON schema for structured output
session.text = {
format: {
type: "json_schema",
json_schema: {
name: "response",
schema: {
type: "object",
properties: {
answer: { type: "string" },
confidence: { type: "number" }
}
}
}
}
}
# Set the maximum output tokens
session.max_output_tokens = 1000
PatientLLM.ask accepts additional options:
PatientLLM.ask(session,
provider: :openai,
callback: LLMCallback,
url: "http://localhost:1234", # Override the provider's base URL
serializer: :messages, # Override the API format
completion_path: "/chat/completions", # Override the endpoint path
headers: {"X-Custom" => "value"}, # Additional HTTP headers
params: {max_completion_tokens: 1000} # Additional request parameters
)
URL composition
The full request URL is built by concatenating the base URL (from the provider registry or the url: option) with the completion_path. When you don't set completion_path, it defaults to the path for the active serializer (/v1/chat/completions for :chat_completion, /v1/responses for :open_responses, /v1/messages for :messages, /converse for :converse, /v1beta/models/{model}:generateContent for :gemini). A {model} placeholder in the path is replaced with the session's model at dispatch time, which is how the Gemini default targets Google's /v1beta/models/{model}:generateContent endpoint. Trailing slashes on the base and leading slashes on the path are normalized, so:
url = "https://api.openai.com" completion_path = "/v1/chat/completions"
-> https://api.openai.com/v1/chat/completions
url = "http://localhost:1234" completion_path = "/v1/chat/completions"
-> http://localhost:1234/v1/chat/completions
If your base URL already includes a /v1 prefix, override the completion path to avoid duplication:
PatientLLM.ask(session,
provider: :openai,
callback: LLMCallback,
url: "https://my-gateway.internal/openai/v1",
completion_path: "/chat/completions"
)
Tool calling
Register tools on the global PromptBuilder.tool_registry:
PromptBuilder.tool_registry.register(
"weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: {type: "string", description: "City name"}
},
required: ["location"]
}
) do |args|
WeatherService.lookup(args["location"])
end
Then add tools to the session and ask normally:
session = PromptBuilder::Session.new(model: "gpt-4o")
session.register_tool("weather",
description: "Get the current weather for a location",
parameters: {type: "object", properties: {location: {type: "string"}}, required: ["location"]}
)
session.user("What's the weather in NYC?")
PatientLLM.ask(session, provider: :openai, callback: LLMCallback)
When the model responds with tool calls, the gem automatically:
- Appends the assistant tool-call response to the session.
- Invokes the matching tool handler from the registry with the LLM-provided arguments.
- Appends a tool-response item to the session.
- Re-issues the request asynchronously.
- Repeats until the model returns a plain text response (or a tool raises
HaltError). Youron_completecallback only fires for the final text response.
If you define an optional on_tool_use method on your callback, it is invoked once per tool-execution round (after the tools run, before the next request is issued) so you can observe intermediate progress.
The loop is capped at PatientLLM::Callback::MAX_TOOL_ITERATIONS (10) iterations per conversation to prevent runaway calls. When the cap is exceeded, your on_error callback is invoked with a PatientHttp::RequestError whose error_type is :max_tool_iterations and whose error_class is PatientLLM::MaxToolIterationsError, so you can handle it alongside transport and HTTP errors.
[!NOTE] Tool handlers execute synchronously inside the callback worker (e.g. a Sidekiq job). Keep handlers fast to avoid blocking the worker pool. If a tool needs to do slow work (external API calls, heavy queries), consider offloading that work and using
HaltErrorto stop the auto-loop.
Halting the loop
Raise PatientLLM::HaltError from a tool handler to stop the auto-loop and surface custom content as the final assistant message:
PromptBuilder.tool_registry.register("auth", description: "Authenticate", parameters: {...}) do |args|
unless AuthService.valid?(args["token"])
raise PatientLLM::HaltError.new(content: "Authentication failed.")
end
AuthService.session_info(args["token"])
end
Serializing Conversations
Sessions can be serialized to JSON for storage and later restored:
# Initial request
session = PromptBuilder::Session.new(model: "gpt-4o")
session.instructions = "You are a helpful assistant."
session.user("Hello!")
PatientLLM.ask(session, provider: :openai, callback: LLMCallback,
callback_args: {conversation_id: conversation.id})
# In your callback, save the state (response is already in the session):
def on_complete(session:, callback_args:, **)
save_to_database(callback_args[:conversation_id], session.to_h)
end
# Later, restore and continue:
session_data = load_from_database(conversation_id)
session = PromptBuilder::Session.from_h(session_data)
session.user("Tell me more about that.")
PatientLLM.ask(session, provider: :openai, callback: LLMCallback,
callback_args: {conversation_id: conversation_id})
Installation
This gem is not yet published to RubyGems. Add it from GitHub:
gem "patient_llm", github: "bdurand/patient_llm"
Then execute:
$ bundle
Contributing
Open a pull request on GitHub.
Please use the standardrb syntax and lint your code with standardrb --fix before submitting.
License
The gem is available as open source under the terms of the MIT License.