Class: SmartPrompt::ZhipuAIAdapter

Inherits:
LLMAdapter show all
Defined in:
lib/smart_prompt/zhipu_adapter.rb

Overview

Adapter for 智谱 AI (BigModel / GLM) — covering all REST model categories behind one provider domain. One adapter owns the whole provider: every category shares the same base URL ‘open.bigmodel.cn/api/paas/v4` and Bearer-token auth, so a single config block serves them just by changing `model`.

1. 文本对话 (chat)   — POST {base}/chat/completions      (OpenAI-compatible; reasoning
                       models return message.reasoning_content, the exact field the engine
                       already reads — no remap needed)
2. 图文多模态 (vision) — same endpoint, OpenAI Vision content array
3. 向量 (embeddings) — POST {base}/embeddings            (embedding-3, custom dimensions)
4. 文生图 (image)    — POST {base}/images/generations    (response is NESTED: data.images[].url)
5. 文生视频 (video)  — POST {base}/videos/generations -> task_id; GET {base}/async-result?task_id=
                       poll until SUCCESS -> video_result.url  (async)
6. 语音合成 (TTS)    — POST {base}/audio/speech          (glm-tts)
7. 语音识别 (ASR)    — POST {base}/audio/transcriptions  (glm-asr-2512, multipart)
8. 重排 (rerank)     — POST {base}/rerank

We talk to the endpoints with Net::HTTP directly (like the SenseNova / image / tts / stt / video adapters) so we can control SSE streaming, the nested image shape, and the async video flow. No new gem deps.

Constant Summary collapse

DEFAULT_BASE_URL =
"https://open.bigmodel.cn/api/paas/v4".freeze
DEFAULT_CODING_BASE_URL =

CodeGeeX-4 / coding models use a separate base.

"https://open.bigmodel.cn/api/coding/paas/v4".freeze
SUPPORTED_IMAGE_FORMATS =
%w[jpg jpeg png gif bmp webp].freeze
CHAT_OPTIONAL_KEYS =

Zhipu chat sampling parameters forwarded from config when present.

%w[
  top_p max_tokens do_sample stop presence_penalty frequency_penalty thinking
].freeze

Instance Attribute Summary

Attributes inherited from LLMAdapter

#last_response

Instance Method Summary collapse

Constructor Details

#initialize(config) ⇒ ZhipuAIAdapter

Returns a new instance of ZhipuAIAdapter.



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/smart_prompt/zhipu_adapter.rb', line 39

def initialize(config)
  super
  SmartPrompt.logger.info "Start create the SmartPrompt ZhipuAIAdapter."

  api_key = @config["api_key"]
  if api_key.is_a?(String) && api_key.start_with?("ENV[") && api_key.end_with?("]")
    api_key = eval(api_key)
  end
  # Match the other adapters: tolerate a missing key at construction so examples/config
  # can load without a live key; the first request fails with a clear auth error.
  SmartPrompt.logger.warn "Zhipu api_key is empty — API calls will fail until it is set." if api_key.nil? || api_key.to_s.strip.empty?

  @api_key     = api_key
  @base_url    = (@config["url"] || DEFAULT_BASE_URL).to_s.chomp("/")
  @coding_base = (@config["coding_url"] || DEFAULT_CODING_BASE_URL).to_s.chomp("/")
  # Optional per-method URL overrides (default to the standard paths off @base_url).
  @image_url  = (@config["image_url"]  || "#{@base_url}/images/generations").to_s
  @video_url  = (@config["video_url"]  || "#{@base_url}/videos/generations").to_s
  @query_url  = (@config["query_url"]  || "#{@base_url}/async-result").to_s
  SmartPrompt.logger.info "Zhipu base_url=#{@base_url}"
end

Instance Method Details

#check_video_status(task_id) ⇒ Object

Poll an async task. Returns the raw status hash (task_status etc.).



193
194
195
196
197
198
199
200
# File 'lib/smart_prompt/zhipu_adapter.rb', line 193

def check_video_status(task_id)
  SmartPrompt.logger.info "ZhipuAIAdapter: polling video task #{task_id}"
  http_get_json("#{@query_url}/#{URI.encode_www_form_component(task_id)}")
rescue LLMAPIError, Error
  raise
rescue => e
  raise LLMAPIError, "Failed to query Zhipu video task: #{e.message}"
end

#download_video(video_url, output_path) ⇒ Object



225
226
227
228
229
230
231
232
233
234
235
236
# File 'lib/smart_prompt/zhipu_adapter.rb', line 225

def download_video(video_url, output_path)
  uri = URI.parse(video_url)
  http = Net::HTTP.new(uri.host, uri.port); http.use_ssl = (uri.scheme == "https")
  response = http.request(Net::HTTP::Get.new(uri.request_uri))
  raise Error, "Failed to download video: #{response.code}" unless response.is_a?(Net::HTTPSuccess)
  FileUtils.mkdir_p(File.dirname(output_path))
  File.binwrite(output_path, response.body)
  SmartPrompt.logger.info "Zhipu video saved to #{output_path}"
  output_path
rescue => e
  raise e.is_a?(SmartPrompt::Error) ? e : Error, "Error downloading Zhipu video: #{e.message}"
end

#embeddings(text, model) ⇒ Object

embedding-3 (default 2048 dims); supports a custom ‘dimensions` (256/512/1024/2048) via config. Returns the first embedding vector.



96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/smart_prompt/zhipu_adapter.rb', line 96

def embeddings(text, model)
  model_name = model || @config["embedding_model"] || @config["model"]
  SmartPrompt.logger.info "ZhipuAIAdapter: embeddings model=#{model_name}"

  body = { "model" => model_name, "input" => text.is_a?(Array) ? text : [text.to_s] }
  body["dimensions"] = @config["dimensions"] if @config["dimensions"]
  body["encoding_format"] = @config["encoding_format"] if @config["encoding_format"]

  response =
    begin
      http_post_json("#{@base_url}/embeddings", body)
    rescue LLMAPIError, Error
      raise
    rescue => e
      raise LLMAPIError, "Failed to call Zhipu embeddings: #{e.message}"
    end

  items = response["data"]
  unless items.is_a?(Array) && items.any? && items[0]["embedding"]
    raise LLMAPIError, "No embedding vector in Zhipu response: #{response.inspect}"
  end
  items[0]["embedding"]
end

#generate_image(prompt, params = {}) ⇒ Object

Text-to-image. The Zhipu response is NESTED: data.images[].url (not OpenAI’s data[]), so we parse defensively. Returns an Array of b64_json:.

Raises:



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/smart_prompt/zhipu_adapter.rb', line 124

def generate_image(prompt, params = {})
  SmartPrompt.logger.info "ZhipuAIAdapter: generating image"
  raise Error, "Prompt cannot be empty" if prompt.nil? || prompt.to_s.strip.empty?

  model_name = params[:model] || @config["image_model"] || @config["model"]
  raise Error, "No model configured for image generation" if model_name.nil? || model_name.to_s.strip.empty?

  body = { "model" => model_name, "prompt" => prompt.to_s }
  body["size"]            = params[:size]            if params[:size]
  body["user"]            = params[:user]            if params[:user]
  body["response_format"] = params[:response_format] if params[:response_format]

  SmartPrompt.logger.info "Zhipu image params: #{body.except('prompt').inspect}"
  response =
    begin
      http_post_json(@image_url, body)
    rescue LLMAPIError, Error
      raise
    rescue => e
      raise Error, "Failed to call Zhipu image generation: #{e.message}"
    end

  images = parse_image_response(response)
  SmartPrompt.logger.info "ZhipuAIAdapter: generated #{images.size} image(s)"
  images
end

#generate_video(prompt, params = {}) ⇒ Object

Submit a text-to-video (or image-to-video) job. Returns the task id.

Raises:



165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/smart_prompt/zhipu_adapter.rb', line 165

def generate_video(prompt, params = {})
  SmartPrompt.logger.info "ZhipuAIAdapter: submitting video job"
  model_name = params[:model] || @config["video_model"] || @config["model"]
  raise Error, "No model configured for video generation" if model_name.nil? || model_name.to_s.strip.empty?

  body = { "model" => model_name, "prompt" => prompt.to_s }
  %i[quality fps duration with_audio resolution request_id seed].each do |k|
    body[k.to_s] = params[k] unless params[k].nil?
  end
  body["image_url"] = normalize_image_url(params[:image_url]) if params[:image_url]

  SmartPrompt.logger.info "Zhipu video params: #{body.except('prompt').inspect}"
  response =
    begin
      http_post_json(@video_url, body)
    rescue LLMAPIError, Error
      raise
    rescue => e
      raise Error, "Failed to submit Zhipu video job: #{e.message}"
    end

  task_id = response["id"] || response["task_id"]
  raise LLMAPIError, "No task id in Zhipu video response: #{response.inspect}" unless task_id
  SmartPrompt.logger.info "ZhipuAIAdapter: video task #{task_id} submitted"
  { task_id: task_id, model: model_name, raw: response }
end

#rerank(query, documents, model: nil) ⇒ Object

—- rerank (bonus) ——————————————————



293
294
295
296
297
298
299
300
301
302
# File 'lib/smart_prompt/zhipu_adapter.rb', line 293

def rerank(query, documents, model: nil)
  model_name = model || @config["rerank_model"] || @config["model"]
  body = { "model" => model_name, "query" => query, "documents" => documents }
  response = http_post_json("#{@base_url}/rerank", body)
  (response["results"] || []).map { |r| { index: r["index"], relevance_score: r["relevance_score"] || r["score"] } }
rescue LLMAPIError, Error
  raise
rescue => e
  raise LLMAPIError, "Failed to call Zhipu rerank: #{e.message}"
end

#save_image(image_data, output_dir = "./output", filename_prefix = "zhipu_image") ⇒ Object

Save one or many generated images to disk (Array from #generate_image or a single hash).



152
153
154
155
156
157
158
159
160
# File 'lib/smart_prompt/zhipu_adapter.rb', line 152

def save_image(image_data, output_dir = "./output", filename_prefix = "zhipu_image")
  FileUtils.mkdir_p(output_dir)
  images = image_data.is_a?(Array) ? image_data : [image_data]
  saved = images.each_with_index.map do |img, index|
    save_single_image(img, output_dir, "#{filename_prefix}_#{index + 1}")
  end
  SmartPrompt.logger.info "Saved #{saved.size} Zhipu image(s) to #{output_dir}"
  saved
end

#send_request(messages, model = nil, temperature = nil, tools = nil, proc = nil) ⇒ Object

Chat / multimodal. Non-streaming returns a full OpenAI-format hash (so last_response carries usage + reasoning_content); streaming calls proc with each OpenAI-shaped chunk.



67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/smart_prompt/zhipu_adapter.rb', line 67

def send_request(messages, model = nil, temperature = nil, tools = nil, proc = nil)
  model_name = model || @config["model"]
  body = build_chat_body(messages, model_name, temperature, tools)
  SmartPrompt.logger.info "ZhipuAIAdapter: chat request model=#{model_name} stream=#{!proc.nil?}"

  url = chat_url_for(model_name)
  if proc
    body["stream"] = true
    stream_chat(url, body) { |data| proc.call(build_stream_chunk(data), 0) }
    SmartPrompt.logger.info "ZhipuAIAdapter: streaming request finished"
    nil
  else
    raw = http_post_json(url, body)
    response = build_completion_response(raw)
    @last_response = response
    SmartPrompt.logger.info "ZhipuAIAdapter: received chat response"
    response
  end
rescue LLMAPIError, Error
  raise
rescue => e
  SmartPrompt.logger.error "Zhipu chat error: #{e.message}"
  raise LLMAPIError, "Failed to call Zhipu chat: #{e.message}"
end

#synthesize_speech(text, voice: nil, model: nil, response_format: "wav", **opts) ⇒ Object

Returns a base64 data URL for the synthesized audio. GLM-TTS accepts wav/pcm only (mp3/flac are rejected), so default to wav.



242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/smart_prompt/zhipu_adapter.rb', line 242

def synthesize_speech(text, voice: nil, model: nil, response_format: "wav", **opts)
  SmartPrompt.logger.info "ZhipuAIAdapter: TTS"
  raise Error, "Text cannot be empty" if text.nil? || text.to_s.strip.empty?

  model_name = model || @config["tts_model"] || "glm-tts"
  body = { "model" => model_name, "input" => text.to_s }
  body["voice"] = voice if voice
  body["response_format"] = response_format
  body["speed"] = opts[:speed] if opts[:speed]
  body["emotion"] = opts[:emotion] if opts[:emotion]

  audio = http_post_binary("#{@base_url}/audio/speech", body)
  "data:audio/#{response_format};base64,#{Base64.strict_encode64(audio)}"
rescue LLMAPIError, Error
  raise
rescue => e
  raise Error, "Failed to call Zhipu TTS: #{e.message}"
end

#synthesize_to_file(text, output_path, voice: nil, model: nil, response_format: "wav", **opts) ⇒ Object



261
262
263
264
265
266
267
268
# File 'lib/smart_prompt/zhipu_adapter.rb', line 261

def synthesize_to_file(text, output_path, voice: nil, model: nil, response_format: "wav", **opts)
  data_url = synthesize_speech(text, voice: voice, model: model, response_format: response_format, **opts)
  FileUtils.mkdir_p(File.dirname(output_path))
  audio_bytes = Base64.decode64(data_url.sub(/\Adata:audio\/\w+;base64,/, ""))
  File.binwrite(output_path, audio_bytes)
  SmartPrompt.logger.info "Zhipu audio saved to #{output_path}"
  { file_path: output_path, format: response_format }
end

#transcribe_audio(audio_file, model: nil, language: nil, **opts) ⇒ Object

Transcribe an audio file (local path). Returns text:.



273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
# File 'lib/smart_prompt/zhipu_adapter.rb', line 273

def transcribe_audio(audio_file, model: nil, language: nil, **opts)
  SmartPrompt.logger.info "ZhipuAIAdapter: ASR #{File.basename(audio_file)}"
  raise Error, "Audio file not found: #{audio_file}" unless File.exist?(audio_file)

  model_name = model || @config["asr_model"] || "glm-asr-2512"
  form = { "model" => model_name }
  form["language"] = language if language
  form["prompt"] = opts[:prompt] if opts[:prompt]
  form["response_format"] = opts[:response_format] if opts[:response_format]

  response = http_post_multipart("#{@base_url}/audio/transcriptions", form, audio_file)
  { text: response["text"] }
rescue LLMAPIError, Error
  raise
rescue => e
  raise e.is_a?(SmartPrompt::Error) ? e : Error, "Failed to call Zhipu ASR: #{e.message}"
end

#wait_for_video_completion(task_id, check_interval: 10, timeout: 600) ⇒ Object

Block until the task finishes (or times out), then return the video URL.



203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# File 'lib/smart_prompt/zhipu_adapter.rb', line 203

def wait_for_video_completion(task_id, check_interval: 10, timeout: 600)
  start = Time.now
  loop do
    status = check_video_status(task_id)
    case task_status_of(status)
    when "SUCCESS"
      url = video_url_of(status)
      raise LLMAPIError, "Video succeeded but no url in: #{status.inspect}" unless url
      SmartPrompt.logger.info "ZhipuAIAdapter: video ready #{url}"
      return { task_id: task_id, status: "SUCCESS", video_url: url, cover_image_url: cover_url_of(status), raw: status }
    when "FAIL", "FAILED"
      raise LLMAPIError, "Zhipu video generation failed: #{status.inspect}"
    else
      if Time.now - start > timeout
        raise LLMAPIError, "Zhipu video generation timeout after #{timeout}s"
      end
      SmartPrompt.logger.info "Zhipu video task #{task_id} still processing..."
      sleep(check_interval)
    end
  end
end