Class: Raix::MultimodalContentAdapter

Inherits:
Object
  • Object
show all
Defined in:
lib/raix/multimodal_content_adapter.rb

Overview

Translates OpenAI-style multimodal content arrays (a ‘text` part plus one or more `image_url` parts) into a RubyLLM::Content so images survive the trip to the provider.

RubyLLM’s ‘add_message`/`ask` treat a raw array of OpenAI content hashes as plain text, so an `{ type: “image_url”, image_url: { url: … } }` part is silently dropped and a vision model receives text only. See github.com/OlympiaAI/raix/issues/51

Anything that is not an array of hashes containing at least one ‘image_url` part is returned untouched, so existing text completions are unaffected.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(content) ⇒ MultimodalContentAdapter

Returns a new instance of MultimodalContentAdapter.



24
25
26
# File 'lib/raix/multimodal_content_adapter.rb', line 24

def initialize(content)
  @content = content
end

Class Method Details

.translate(content) ⇒ Object



20
21
22
# File 'lib/raix/multimodal_content_adapter.rb', line 20

def self.translate(content)
  new(content).translate
end

Instance Method Details

#translateObject



28
29
30
31
32
33
34
35
36
37
38
# File 'lib/raix/multimodal_content_adapter.rb', line 28

def translate
  return @content unless translatable?

  parts = @content.map(&:with_indifferent_access)
  attachments = parts.select { |part| part[:type].to_s == "image_url" }
                     .filter_map { |part| attachment_source(part.dig(:image_url, :url)) }
  return @content if attachments.empty?

  text = parts.select { |part| part[:type].to_s == "text" }.filter_map { |part| part[:text] }.join("\n")
  RubyLLM::Content.new(text.empty? ? nil : text, attachments)
end