Class: SmartPrompt::ImageGenerationAdapter

Inherits:
LLMAdapter
  • Object
show all
Defined in:
lib/smart_prompt/image_generation_adapter.rb

Overview

Adapter for SiliconFlow’s image generation API.

SiliconFlow exposes image generation through a single endpoint:

POST {url}/images/generations

Unlike OpenAI’s image API, SiliconFlow uses its own parameter names (‘image_size`, `batch_size`, `negative_prompt`, `num_inference_steps`, `guidance_scale`, `cfg`, …) and returns an `images` array instead of a `data` array. The OpenAI gem’s ‘images.generate` helper therefore does not fit, so — like the TTS/Video adapters — we talk to the endpoint directly with Net::HTTP.

Constant Summary collapse

SUPPORTED_IMAGE_FORMATS =
%w[jpg jpeg png gif bmp webp].freeze
DEFAULT_IMAGE_SIZE =

Default resolution for text-to-image generation (“widthxheight”). Edit models (Qwen/Qwen-Image-Edit*) ignore this field, so it is only sent for text-to-image calls.

"1024x1024"

Instance Attribute Summary

Attributes inherited from LLMAdapter

#last_response

Instance Method Summary collapse

Constructor Details

#initialize(config) ⇒ ImageGenerationAdapter

Returns a new instance of ImageGenerationAdapter.



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/smart_prompt/image_generation_adapter.rb', line 29

def initialize(config)
  super
  api_key = @config["api_key"]
  if api_key.is_a?(String) && api_key.start_with?("ENV[") && api_key.end_with?("]")
    api_key = eval(api_key)
  end
  @api_key  = api_key
  @base_url = @config["url"].to_s.chomp("/")
  @model    = @config["model"]

  begin
    # Created for parity with the other non-chat adapters; the actual image
    # requests are issued directly below via Net::HTTP.
    @client = OpenAI::Client.new(
      access_token: @api_key,
      uri_base: @config["url"],
      request_timeout: 240,
    )
  rescue OpenAI::ConfigurationError => e
    SmartPrompt.logger.error "Failed to initialize ImageGeneration client: #{e.message}"
    raise LLMAPIError, "Invalid ImageGeneration configuration: #{e.message}"
  rescue OpenAI::Error => e
    SmartPrompt.logger.error "Failed to initialize ImageGeneration client: #{e.message}"
    raise LLMAPIError, "ImageGeneration authentication failed: #{e.message}"
  rescue SocketError => e
    SmartPrompt.logger.error "Failed to initialize ImageGeneration client: #{e.message}"
    raise LLMAPIError, "Network error: Unable to connect to ImageGeneration API"
  rescue => e
    SmartPrompt.logger.error "Failed to initialize ImageGeneration client: #{e.message}"
    raise Error, "Unexpected error initializing ImageGeneration client: #{e.message}"
  ensure
    SmartPrompt.logger.info "Successfully created an ImageGeneration client (model=#{@model})."
  end
end

Instance Method Details

#edit_image(prompt, params = {}) ⇒ Object

Image editing / image-to-image generation (for Qwen/Qwen-Image-Edit-* and Kolors composable models). image (and optionally image2/image3) may be a local file path, a base64 data URL, or a public http(s) URL.

Raises:



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/smart_prompt/image_generation_adapter.rb', line 108

def edit_image(prompt, params = {})
  SmartPrompt.logger.info "ImageGenerationAdapter: Editing image"

  raise Error, "Prompt cannot be empty" if prompt.nil? || prompt.to_s.strip.empty?
  raise Error, "An input image is required for image editing" if params[:image].nil? && params[:image_file].nil?

  normalized = params.dup
  normalized[:image] = normalize_input_image(normalized[:image] || normalized[:image_file])
  normalized[:image2] = normalize_input_image(normalized[:image2]) if normalized[:image2]
  normalized[:image3] = normalize_input_image(normalized[:image3]) if normalized[:image3]

  # Edit models reject image_size, so we deliberately omit it here.
  parameters = build_parameters(prompt, normalized)
  parameters[:image]  = normalized[:image]
  parameters[:image2] = normalized[:image2] if normalized[:image2]
  parameters[:image3] = normalized[:image3] if normalized[:image3]

  SmartPrompt.logger.info "Image edit parameters: #{parameters.except(:prompt, :image, :image2, :image3).inspect}"

  begin
    response = submit_image_request("/images/generations", parameters)
    @last_response = response
    images = parse_images(response)
    SmartPrompt.logger.info "Successfully edited image, generated #{images.size} result(s)"
    images
  rescue LLMAPIError, Error
    raise
  rescue JSON::ParserError => e
    SmartPrompt.logger.error "Failed to parse image edit response: #{e.message}"
    raise LLMAPIError, "Failed to parse image edit response"
  rescue => e
    SmartPrompt.logger.error "Unexpected error during image editing: #{e.message}"
    raise Error, "Unexpected error during image editing: #{e.message}"
  end
end

#generate_image(prompt, params = {}) ⇒ Object

Text-to-image generation.

params accepts SiliconFlow-native keys plus a couple of friendly aliases:

model:, negative_prompt:,
image_size: (alias: size:),
batch_size: (alias: n:),
seed:, num_inference_steps:, guidance_scale:, cfg:

Returns an Array of hashes, e.g. [{ url: “…”, b64_json: nil, seed: 123 }].

Raises:



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/smart_prompt/image_generation_adapter.rb', line 74

def generate_image(prompt, params = {})
  SmartPrompt.logger.info "ImageGenerationAdapter: Generating image from text"

  raise Error, "Prompt cannot be empty" if prompt.nil? || prompt.to_s.strip.empty?

  parameters = build_parameters(prompt, params)
  parameters[:image_size] = resolve_image_size(params)
  # batch_size only applies to a subset of models (e.g. Kolors); send it
  # only when the caller explicitly asks for it.
  batch = params[:batch_size] || params[:n]
  parameters[:batch_size] = batch if batch

  SmartPrompt.logger.info "Image generation parameters: #{parameters.except(:prompt).inspect}"

  begin
    response = submit_image_request("/images/generations", parameters)
    @last_response = response
    images = parse_images(response)
    SmartPrompt.logger.info "Successfully generated #{images.size} image(s)"
    images
  rescue LLMAPIError, Error
    raise
  rescue JSON::ParserError => e
    SmartPrompt.logger.error "Failed to parse image generation response: #{e.message}"
    raise LLMAPIError, "Failed to parse image generation response"
  rescue => e
    SmartPrompt.logger.error "Unexpected error during image generation: #{e.message}"
    raise Error, "Unexpected error during image generation: #{e.message}"
  end
end

#save_image(image_data, output_dir = "./output", filename_prefix = "generated_image") ⇒ Object

Save one or many generated images to disk. Accepts the Array returned by #generate_image/#edit_image or a single image hash. Returns the list of written file paths.



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/smart_prompt/image_generation_adapter.rb', line 147

def save_image(image_data, output_dir = "./output", filename_prefix = "generated_image")
  SmartPrompt.logger.info "ImageGenerationAdapter: Saving image to file"

  begin
    FileUtils.mkdir_p(output_dir)
    images = image_data.is_a?(Array) ? image_data : [image_data]

    saved_files = images.each_with_index.map do |img, index|
      save_single_image(img, output_dir, "#{filename_prefix}_#{index + 1}")
    end

    SmartPrompt.logger.info "Successfully saved #{saved_files.size} image(s) to #{output_dir}"
    saved_files
  rescue => e
    SmartPrompt.logger.error "Error saving image: #{e.message}"
    raise Error, "Error saving image: #{e.message}"
  end
end