Class: Rubino::LLM::RubyLLMAdapter
- Inherits:
-
Object
- Object
- Rubino::LLM::RubyLLMAdapter
- Defined in:
- lib/rubino/llm/ruby_llm_adapter.rb
Overview
Adapter wrapping ruby_llm to isolate all LLM integration details. The rest of the application never calls ruby_llm directly.
Instance Attribute Summary collapse
-
#model_id ⇒ Object
readonly
Returns the value of attribute model_id.
-
#provider ⇒ Object
readonly
Returns the value of attribute provider.
Instance Method Summary collapse
-
#call(request) ⇒ Object
The single LLM boundary entry: take one LLM::Request, dispatch to the streaming vs non-streaming transport based on request.stream, and return a normalized AdapterResponse.
-
#chat(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) ⇒ Object
Sends a chat completion request (non-streaming).
-
#context_window ⇒ Object
Returns the context window size for the current model.
-
#initialize(model_id: nil, provider: nil, config: nil, ui: nil, event_bus: nil, tool_executor: nil, cancel_token: nil, isolate_config: false) ⇒ RubyLLMAdapter
constructor
A new instance of RubyLLMAdapter.
-
#model_info ⇒ Object
Returns model information (context window, etc.).
-
#stream(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) ⇒ Object
Sends a streaming chat request, yielding chunks.
Constructor Details
#initialize(model_id: nil, provider: nil, config: nil, ui: nil, event_bus: nil, tool_executor: nil, cancel_token: nil, isolate_config: false) ⇒ RubyLLMAdapter
Returns a new instance of RubyLLMAdapter.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 31 def initialize(model_id: nil, provider: nil, config: nil, ui: nil, event_bus: nil, tool_executor: nil, cancel_token: nil, isolate_config: false) @config = config || Rubino.configuration @model_id = model_id || @config.model_default @provider = provider || resolve_provider @temperature = @config.model_temperature @ui = ui || Rubino.ui @event_bus = event_bus || Rubino.event_bus @tool_executor = tool_executor # nil = ToolBridge falls back to direct tool.call @cancel_token = cancel_token # SLICE-7: when built as a FallbackChain entry, scope provider config # (api keys / base_url / timeout) into a per-adapter RubyLLM::Context # instead of the process-global RubyLLM.configure. This is the heart of # the global-config hazard fix: switching providers # for a fallback must NOT mutate the global, or concurrent sessions on the # API/server path corrupt each other's provider config. The primary # adapter (isolate_config: false) keeps writing the global exactly as # before, so existing single-provider setups are byte-identical. if isolate_config @context = RubyLLM.context { |c| apply_provider_config!(c) } else configure_ruby_llm! end end |
Instance Attribute Details
#model_id ⇒ Object (readonly)
Returns the value of attribute model_id.
29 30 31 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 29 def model_id @model_id end |
#provider ⇒ Object (readonly)
Returns the value of attribute provider.
29 30 31 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 29 def provider @provider end |
Instance Method Details
#call(request) ⇒ Object
The single LLM boundary entry: take one LLM::Request, dispatch to the streaming vs non-streaming transport based on request.stream, and return a normalized AdapterResponse. The streaming variant yields chunks to the block then returns the same Response. This is the front door the conversation loop depends on; #chat / #stream remain as the underlying transports and stay valid for existing callers.
Graceful thinking degradation (#75): a provider on the anthropic- compatible path that rejects the thinking budget used to hard-error the user’s very first prompt (the default effort is medium). When the rejection is recognised, remember it for the session, tell the user once, and retry this same request WITHOUT the budget. Safe to re-issue: the rejection is a pre-stream 400, so no token reached the UI.
70 71 72 73 74 75 76 77 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 70 def call(request, &) dispatch(request, &) rescue StandardError => e raise unless thinking_budget_rejected?(e) ThinkingSupport.mark_unsupported!(@provider, notify: @ui) dispatch(request, &) end |
#chat(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) ⇒ Object
Sends a chat completion request (non-streaming). image_paths, if any, are forwarded to ruby_llm’s ‘with:` slot so the primary model ingests the bytes natively (no `vision` tool round-trip). Only meaningful on the first model call of a turn — Loop strips it for follow-ups.
83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 83 def chat(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) if bedrock_bearer_mode? bedrock_bearer_client.chat(messages: , tools: tools) else chat_instance = build_chat(tools: tools, response_format: response_format) load_history(chat_instance, ) apply_prefill(chat_instance, prefill) response = chat_instance.ask(last_user_content(), with: presence(image_paths)) build_response(response) end end |
#context_window ⇒ Object
Returns the context window size for the current model
130 131 132 133 134 135 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 130 def context_window info = model_info return @config.model_context_length if @config.model_context_length info&.context_window || 128_000 end |
#model_info ⇒ Object
Returns model information (context window, etc.)
123 124 125 126 127 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 123 def model_info RubyLLM.models.find(@model_id) rescue StandardError nil end |
#stream(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) ⇒ Object
Sends a streaming chat request, yielding chunks. Inline <think>…</think> sentinels are routed to the :thinking channel. Buffered partial content is preserved across mid-stream parse errors so downstream code can show whatever the model produced before the failure.
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 99 def stream(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil, &) if bedrock_bearer_mode? # BedrockBearerClient#stream buffers the whole /converse response before # its first emit, so a transport error can only fire pre-first-chunk — # no token reached the UI. It raises straight through to the runner, # which re-issues a fresh request (safe, no double output). return bedrock_bearer_client.stream(messages: , tools: tools, &) end # No retry wrapper here — retry ownership moved to Agent::ModelCallRunner # (Slice 4) to avoid double-retrying the same failure. The streaming # transport-drop PROTECTION still lives inside #stream_once: it RAISES a # transport drop only when NOTHING was emitted to the UI yet # (chunks_seen.zero?), so the runner can re-issue a fresh request before # any token reached the user — no double output. Once a chunk has flowed # it RETURNS the buffered partial instead of raising, so the drop can # never be retried mid-stream. The raise-vs-return decision (the only # streaming-specific safety) stays here; the actual retrying is the # runner's job. stream_once(messages: , tools: tools, response_format: response_format, image_paths: image_paths, prefill: prefill, &) end |