Class: Rubino::LLM::BedrockBearerClient

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/llm/bedrock_bearer_client.rb

Overview

Direct Bedrock runtime client using Bearer token authentication. Used when BEDROCK_API_KEY is set without BEDROCK_SECRET_KEY. Calls the Bedrock Converse API with Authorization: Bearer header. Supports tool calls via the native Bedrock Converse toolConfig format.

Constant Summary collapse

BEDROCK_RUNTIME_HOST =
"bedrock-runtime.%s.amazonaws.com"

Instance Method Summary collapse

Constructor Details

#initialize(api_key:, region:, model_id:, show_reasoning: false, event_bus: nil) ⇒ BedrockBearerClient

Returns a new instance of BedrockBearerClient.



17
18
19
20
21
22
23
24
# File 'lib/rubino/llm/bedrock_bearer_client.rb', line 17

def initialize(api_key:, region:, model_id:, show_reasoning: false, event_bus: nil)
  @api_key        = api_key
  @region         = region
  @model_id       = model_id
  @host           = BEDROCK_RUNTIME_HOST % region
  @show_reasoning = show_reasoning
  @event_bus      = event_bus
end

Instance Method Details

#chat(messages:, tools: nil) ⇒ Object

Sends a non-streaming chat request, returns AdapterResponse



27
28
29
30
31
# File 'lib/rubino/llm/bedrock_bearer_client.rb', line 27

def chat(messages:, tools: nil)
  body     = build_body(messages, tools: tools)
  response = post("/model/#{URI.encode_uri_component(@model_id)}/converse", body)
  parse_response(response)
end

#stream(messages:, tools: nil, &block) ⇒ Object

Sends a “streaming” chat request and returns an AdapterResponse, yielding chunk HASHES shaped exactly like every other adapter:

{ type: :content | :thinking, text: String, message_id: Integer }

Real Bedrock ConverseStream (binary eventstream) is out of scope: bearer- token auth isn’t supported by ruby_llm’s SigV4 Bedrock provider, and this is a plain Net::HTTP transport. We buffer the non-streaming /converse response FULLY, then replay it through InlineThinkFilter in slices so the SHAPE matches the streaming contract (typed deltas, :thinking channel, a single content block id, an explicit MESSAGE_COMPLETED boundary). Only the token cadence is synthetic.

INVARIANT: we buffer the entire response BEFORE the first emit. That is what makes retrying this call (now in Agent::ModelCallRunner) safe — a transport error can only fire during post() (before any chunk reached the UI), never mid-replay, so a retry can’t double output.



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/rubino/llm/bedrock_bearer_client.rb', line 49

def stream(messages:, tools: nil, &block)
  body = build_body(messages, tools: tools)
  data = post("/model/#{URI.encode_uri_component(@model_id)}/converse", body)

  # Single buffered content block ⇒ message_id is always 0. Mirrors the
  # 2-arg emit lambda RubyLLMAdapter feeds into InlineThinkFilter.feed/flush.
  emit = lambda do |type, text|
    return if text.nil? || text.empty?
    return if type == :thinking && !@show_reasoning

    block&.call({ type: type, text: text, message_id: 0 })
  end

  think_filter = InlineThinkFilter.new
  extract_text(data).chars.each_slice(5) do |slice|
    think_filter.feed(slice.join, &emit)
  end
  think_filter.flush(&emit)

  @event_bus&.emit(Interaction::Events::MESSAGE_COMPLETED, message_id: 0)

  parse_response(data)
end