Class: Rubino::LLM::RubyLLMAdapter

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/llm/ruby_llm_adapter.rb

Overview

Adapter wrapping ruby_llm to isolate all LLM integration details. The rest of the application never calls ruby_llm directly.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model_id: nil, provider: nil, config: nil, ui: nil, event_bus: nil, tool_executor: nil, cancel_token: nil, isolate_config: false) ⇒ RubyLLMAdapter

Returns a new instance of RubyLLMAdapter.



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 31

def initialize(model_id: nil, provider: nil, config: nil, ui: nil, event_bus: nil,
               tool_executor: nil, cancel_token: nil, isolate_config: false)
  @config        = config || Rubino.configuration
  @model_id      = model_id || @config.model_default
  @provider      = provider || resolve_provider
  @temperature   = @config.model_temperature
  @ui            = ui || Rubino.ui
  @event_bus     = event_bus || Rubino.event_bus
  @tool_executor = tool_executor # nil = ToolBridge falls back to direct tool.call
  @cancel_token  = cancel_token

  # SLICE-7: when built as a FallbackChain entry, scope provider config
  # (api keys / base_url / timeout) into a per-adapter RubyLLM::Context
  # instead of the process-global RubyLLM.configure. This is the heart of
  # the global-config hazard fix: switching providers
  # for a fallback must NOT mutate the global, or concurrent sessions on the
  # API/server path corrupt each other's provider config. The primary
  # adapter (isolate_config: false) keeps writing the global exactly as
  # before, so existing single-provider setups are byte-identical.
  if isolate_config
    @context = RubyLLM.context { |c| apply_provider_config!(c) }
  else
    configure_ruby_llm!
  end
end

Instance Attribute Details

#model_idObject (readonly)

Returns the value of attribute model_id.



29
30
31
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 29

def model_id
  @model_id
end

#providerObject (readonly)

Returns the value of attribute provider.



29
30
31
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 29

def provider
  @provider
end

Instance Method Details

#call(request) ⇒ Object

The single LLM boundary entry: take one LLM::Request, dispatch to the streaming vs non-streaming transport based on request.stream, and return a normalized AdapterResponse. The streaming variant yields chunks to the block then returns the same Response. This is the front door the conversation loop depends on; #chat / #stream remain as the underlying transports and stay valid for existing callers.

Graceful thinking degradation (#75): a provider on the anthropic- compatible path that rejects the thinking budget used to hard-error the user’s very first prompt (the default effort is medium). When the rejection is recognised, remember it for the session, tell the user once, and retry this same request WITHOUT the budget. Safe to re-issue: the rejection is a pre-stream 400, so no token reached the UI.



70
71
72
73
74
75
76
77
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 70

def call(request, &)
  dispatch(request, &)
rescue StandardError => e
  raise unless thinking_budget_rejected?(e)

  ThinkingSupport.mark_unsupported!(@provider, notify: @ui)
  dispatch(request, &)
end

#chat(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) ⇒ Object

Sends a chat completion request (non-streaming). image_paths, if any, are forwarded to ruby_llm’s ‘with:` slot so the primary model ingests the bytes natively (no `vision` tool round-trip). Only meaningful on the first model call of a turn — Loop strips it for follow-ups.



83
84
85
86
87
88
89
90
91
92
93
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 83

def chat(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil)
  if bedrock_bearer_mode?
    bedrock_bearer_client.chat(messages: messages, tools: tools)
  else
    chat_instance = build_chat(tools: tools, response_format: response_format)
    load_history(chat_instance, messages)
    apply_prefill(chat_instance, prefill)
    response = chat_instance.ask(last_user_content(messages), with: presence(image_paths))
    build_response(response)
  end
end

#context_windowObject

Returns the context window size for the current model



130
131
132
133
134
135
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 130

def context_window
  info = model_info
  return @config.model_context_length if @config.model_context_length

  info&.context_window || 128_000
end

#model_infoObject

Returns model information (context window, etc.)



123
124
125
126
127
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 123

def model_info
  RubyLLM.models.find(@model_id)
rescue StandardError
  nil
end

#stream(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil) ⇒ Object

Sends a streaming chat request, yielding chunks. Inline <think>…</think> sentinels are routed to the :thinking channel. Buffered partial content is preserved across mid-stream parse errors so downstream code can show whatever the model produced before the failure.



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/rubino/llm/ruby_llm_adapter.rb', line 99

def stream(messages:, tools: nil, response_format: nil, image_paths: [], prefill: nil, &)
  if bedrock_bearer_mode?
    # BedrockBearerClient#stream buffers the whole /converse response before
    # its first emit, so a transport error can only fire pre-first-chunk —
    # no token reached the UI. It raises straight through to the runner,
    # which re-issues a fresh request (safe, no double output).
    return bedrock_bearer_client.stream(messages: messages, tools: tools, &)
  end

  # No retry wrapper here — retry ownership moved to Agent::ModelCallRunner
  # (Slice 4) to avoid double-retrying the same failure. The streaming
  # transport-drop PROTECTION still lives inside #stream_once: it RAISES a
  # transport drop only when NOTHING was emitted to the UI yet
  # (chunks_seen.zero?), so the runner can re-issue a fresh request before
  # any token reached the user — no double output. Once a chunk has flowed
  # it RETURNS the buffered partial instead of raising, so the drop can
  # never be retried mid-stream. The raise-vs-return decision (the only
  # streaming-specific safety) stays here; the actual retrying is the
  # runner's job.
  stream_once(messages: messages, tools: tools, response_format: response_format,
              image_paths: image_paths, prefill: prefill, &)
end