Module: SmartPrompt::MultimodalMessages
- Included in:
- SenseNovaAdapter, SiliconFlowAdapter, ZhipuAIAdapter
- Defined in:
- lib/smart_prompt/concerns/multimodal_messages.rb
Overview
Shared multimodal-message normalization for Net::HTTP adapters (ZhipuAI, SenseNova, SiliconFlow). Turns an OpenAI-style content array into the shape the provider expects, inlining local image/audio/video files as base64 data URLs and passing http(s)/data URLs through. Each adapter previously carried a near-identical copy of this logic.
SiliconFlow’s variant is the superset (image_url + video_url + audio_url, preserving detail/max_frames/fps); Zhipu/SenseNova only ever send image_url, which is a subset.
Constant Summary collapse
- SUPPORTED_IMAGE_FORMATS =
%w[jpg jpeg png gif bmp webp].freeze
Instance Method Summary collapse
- #normalize_content_item(item) ⇒ Object
-
#normalize_image_url(url) ⇒ Object
Single-arg image-only shim (call sites like generate_video pass a plain image URL).
-
#normalize_input_image(image) ⇒ Object
Accept a local path, a base64 data URL, or an http(s) URL for image-edit / image-to-video ‘image` fields.
-
#normalize_media_part(item, type, media_kind) ⇒ Object
Build an image_url/video_url/audio_url part, inlining local files as data URLs and preserving any extra keys (detail, max_frames, fps) on the media hash.
-
#normalize_media_url(url, kind = :image) ⇒ Object
Resolve a media URL embedded in a message: http(s)/data pass through; a local path is base64-encoded as a data URL.
- #process_multimodal_messages(messages) ⇒ Object
- #stringify_hash(hash) ⇒ Object
Instance Method Details
#normalize_content_item(item) ⇒ Object
25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 25 def normalize_content_item(item) return { "type" => "text", "text" => item.to_s } unless item.is_a?(Hash) case item[:type] || item["type"] when "image_url" normalize_media_part(item, "image_url", :image) when "video_url" normalize_media_part(item, "video_url", :video) when "audio_url" normalize_media_part(item, "audio_url", :audio) else stringify_hash(item) end end |
#normalize_image_url(url) ⇒ Object
Single-arg image-only shim (call sites like generate_video pass a plain image URL).
76 77 78 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 76 def normalize_image_url(url) normalize_media_url(url, :image) end |
#normalize_input_image(image) ⇒ Object
Accept a local path, a base64 data URL, or an http(s) URL for image-edit / image-to-video ‘image` fields.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 82 def normalize_input_image(image) return image if image.nil? if image.is_a?(String) return image if image.start_with?("data:") return image if image.start_with?("http://", "https://") end raise Error, "Image file not found: #{image}" unless File.exist?(image) ext = File.extname(image).downcase.delete(".") raise Error, "Unsupported image format: #{ext}" unless SUPPORTED_IMAGE_FORMATS.include?(ext) mime = ext == "jpg" ? "jpeg" : ext "data:image/#{mime};base64,#{Base64.strict_encode64(File.binread(image))}" end |
#normalize_media_part(item, type, media_kind) ⇒ Object
Build an image_url/video_url/audio_url part, inlining local files as data URLs and preserving any extra keys (detail, max_frames, fps) on the media hash.
42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 42 def normalize_media_part(item, type, media_kind) iu = item[type.to_sym] || item[type] if iu.is_a?(Hash) url = iu[:url] || iu["url"] part = { "type" => type, type => { "url" => normalize_media_url(url, media_kind) } } iu.each { |k, v| part[type][k.to_s] = stringify_hash(v) unless k.to_s == "url" } part else { "type" => type, type => { "url" => normalize_media_url(iu, media_kind) } } end end |
#normalize_media_url(url, kind = :image) ⇒ Object
Resolve a media URL embedded in a message: http(s)/data pass through; a local path is base64-encoded as a data URL.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 56 def normalize_media_url(url, kind = :image) return url if url.nil? return url if url.start_with?("http://", "https://", "data:") label = kind == :image ? "Image" : kind.to_s.capitalize raise Error, "#{label} file not found: #{url}" unless File.exist?(url) ext = File.extname(url).downcase.delete(".") case kind when :image raise Error, "Unsupported image format: #{ext}" unless SUPPORTED_IMAGE_FORMATS.include?(ext) mime = ext == "jpg" ? "jpeg" : ext "data:image/#{mime};base64,#{Base64.strict_encode64(File.binread(url))}" when :audio "data:audio/#{ext.empty? ? 'wav' : ext};base64,#{Base64.strict_encode64(File.binread(url))}" when :video "data:video/#{ext.empty? ? 'mp4' : ext};base64,#{Base64.strict_encode64(File.binread(url))}" end end |
#process_multimodal_messages(messages) ⇒ Object
16 17 18 19 20 21 22 23 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 16 def () .map do |msg| role = msg[:role] || msg["role"] content = msg[:content] || msg["content"] content = content.map { |item| normalize_content_item(item) } if content.is_a?(Array) { "role" => role, "content" => content } end end |
#stringify_hash(hash) ⇒ Object
97 98 99 100 101 102 103 104 105 106 |
# File 'lib/smart_prompt/concerns/multimodal_messages.rb', line 97 def stringify_hash(hash) case hash when Hash hash.each_with_object({}) { |(k, v), memo| memo[k.to_s] = stringify_hash(v) } when Array hash.map { |v| stringify_hash(v) } else hash end end |