Module: SmartPrompt::MultimodalMessages

Included in:
SenseNovaAdapter, SiliconFlowAdapter, ZhipuAIAdapter
Defined in:
lib/smart_prompt/concerns/multimodal_messages.rb

Overview

Shared multimodal-message normalization for Net::HTTP adapters (ZhipuAI, SenseNova, SiliconFlow). Turns an OpenAI-style content array into the shape the provider expects, inlining local image/audio/video files as base64 data URLs and passing http(s)/data URLs through. Each adapter previously carried a near-identical copy of this logic.

SiliconFlow’s variant is the superset (image_url + video_url + audio_url, preserving detail/max_frames/fps); Zhipu/SenseNova only ever send image_url, which is a subset.

Constant Summary collapse

SUPPORTED_IMAGE_FORMATS =
%w[jpg jpeg png gif bmp webp].freeze

Instance Method Summary collapse

Instance Method Details

#normalize_content_item(item) ⇒ Object



25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 25

def normalize_content_item(item)
  return { "type" => "text", "text" => item.to_s } unless item.is_a?(Hash)

  case item[:type] || item["type"]
  when "image_url"
    normalize_media_part(item, "image_url", :image)
  when "video_url"
    normalize_media_part(item, "video_url", :video)
  when "audio_url"
    normalize_media_part(item, "audio_url", :audio)
  else
    stringify_hash(item)
  end
end

#normalize_image_url(url) ⇒ Object

Single-arg image-only shim (call sites like generate_video pass a plain image URL).



76
77
78
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 76

def normalize_image_url(url)
  normalize_media_url(url, :image)
end

#normalize_input_image(image) ⇒ Object

Accept a local path, a base64 data URL, or an http(s) URL for image-edit / image-to-video ‘image` fields.

Raises:



82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 82

def normalize_input_image(image)
  return image if image.nil?

  if image.is_a?(String)
    return image if image.start_with?("data:")
    return image if image.start_with?("http://", "https://")
  end

  raise Error, "Image file not found: #{image}" unless File.exist?(image)
  ext = File.extname(image).downcase.delete(".")
  raise Error, "Unsupported image format: #{ext}" unless SUPPORTED_IMAGE_FORMATS.include?(ext)
  mime = ext == "jpg" ? "jpeg" : ext
  "data:image/#{mime};base64,#{Base64.strict_encode64(File.binread(image))}"
end

#normalize_media_part(item, type, media_kind) ⇒ Object

Build an image_url/video_url/audio_url part, inlining local files as data URLs and preserving any extra keys (detail, max_frames, fps) on the media hash.



42
43
44
45
46
47
48
49
50
51
52
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 42

def normalize_media_part(item, type, media_kind)
  iu = item[type.to_sym] || item[type]
  if iu.is_a?(Hash)
    url = iu[:url] || iu["url"]
    part = { "type" => type, type => { "url" => normalize_media_url(url, media_kind) } }
    iu.each { |k, v| part[type][k.to_s] = stringify_hash(v) unless k.to_s == "url" }
    part
  else
    { "type" => type, type => { "url" => normalize_media_url(iu, media_kind) } }
  end
end

#normalize_media_url(url, kind = :image) ⇒ Object

Resolve a media URL embedded in a message: http(s)/data pass through; a local path is base64-encoded as a data URL.

Raises:



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 56

def normalize_media_url(url, kind = :image)
  return url if url.nil?
  return url if url.start_with?("http://", "https://", "data:")

  label = kind == :image ? "Image" : kind.to_s.capitalize
  raise Error, "#{label} file not found: #{url}" unless File.exist?(url)
  ext = File.extname(url).downcase.delete(".")
  case kind
  when :image
    raise Error, "Unsupported image format: #{ext}" unless SUPPORTED_IMAGE_FORMATS.include?(ext)
    mime = ext == "jpg" ? "jpeg" : ext
    "data:image/#{mime};base64,#{Base64.strict_encode64(File.binread(url))}"
  when :audio
    "data:audio/#{ext.empty? ? 'wav' : ext};base64,#{Base64.strict_encode64(File.binread(url))}"
  when :video
    "data:video/#{ext.empty? ? 'mp4' : ext};base64,#{Base64.strict_encode64(File.binread(url))}"
  end
end

#process_multimodal_messages(messages) ⇒ Object



16
17
18
19
20
21
22
23
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 16

def process_multimodal_messages(messages)
  messages.map do |msg|
    role = msg[:role] || msg["role"]
    content = msg[:content] || msg["content"]
    content = content.map { |item| normalize_content_item(item) } if content.is_a?(Array)
    { "role" => role, "content" => content }
  end
end

#stringify_hash(hash) ⇒ Object



97
98
99
100
101
102
103
104
105
106
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 97

def stringify_hash(hash)
  case hash
  when Hash
    hash.each_with_object({}) { |(k, v), memo| memo[k.to_s] = stringify_hash(v) }
  when Array
    hash.map { |v| stringify_hash(v) }
  else
    hash
  end
end