Class: SmartPrompt::ZhipuAIAdapter

Inherits:

LLMAdapter

Object
LLMAdapter
SmartPrompt::ZhipuAIAdapter

show all

Defined in:: lib/smart_prompt/zhipu_adapter.rb

Overview

Adapter for 智谱 AI (BigModel / GLM) — covering all REST model categories behind one provider domain. One adapter owns the whole provider: every category shares the same base URL ‘open.bigmodel.cn/api/paas/v4` and Bearer-token auth, so a single config block serves them just by changing `model`.

1. 文本对话 (chat)   — POST {base}/chat/completions      (OpenAI-compatible; reasoning
                       models return message.reasoning_content, the exact field the engine
                       already reads — no remap needed)
2. 图文多模态 (vision) — same endpoint, OpenAI Vision content array
3. 向量 (embeddings) — POST {base}/embeddings            (embedding-3, custom dimensions)
4. 文生图 (image)    — POST {base}/images/generations    (response is NESTED: data.images[].url)
5. 文生视频 (video)  — POST {base}/videos/generations -> task_id; GET {base}/async-result?task_id=
                       poll until SUCCESS -> video_result.url  (async)
6. 语音合成 (TTS)    — POST {base}/audio/speech          (glm-tts)
7. 语音识别 (ASR)    — POST {base}/audio/transcriptions  (glm-asr-2512, multipart)
8. 重排 (rerank)     — POST {base}/rerank

We talk to the endpoints with Net::HTTP directly (like the SenseNova / image / tts / stt / video adapters) so we can control SSE streaming, the nested image shape, and the async video flow. No new gem deps.

Constant Summary collapse

DEFAULT_BASE_URL =

"https://open.bigmodel.cn/api/paas/v4".freeze

DEFAULT_CODING_BASE_URL = CodeGeeX-4 / coding models use a separate base.

"https://open.bigmodel.cn/api/coding/paas/v4".freeze

SUPPORTED_IMAGE_FORMATS =

%w[jpg jpeg png gif bmp webp].freeze

CHAT_OPTIONAL_KEYS = Zhipu chat sampling parameters forwarded from config when present.

%w[
  top_p max_tokens do_sample stop presence_penalty frequency_penalty thinking
].freeze

Instance Attribute Summary

Attributes inherited from LLMAdapter

#last_response

Instance Method Summary collapse

#check_video_status(task_id) ⇒ Object

Poll an async task.
#download_video(video_url, output_path) ⇒ Object
#embeddings(text, model) ⇒ Object

embedding-3 (default 2048 dims); supports a custom ‘dimensions` (256/512/1024/2048) via config.
#generate_image(prompt, params = {}) ⇒ Object

Text-to-image.
#generate_video(prompt, params = {}) ⇒ Object

Submit a text-to-video (or image-to-video) job.
#initialize(config) ⇒ ZhipuAIAdapter constructor

A new instance of ZhipuAIAdapter.
#rerank(query, documents, model: nil) ⇒ Object

—- rerank (bonus) ——————————————————.
#save_image(image_data, output_dir = "./output", filename_prefix = "zhipu_image") ⇒ Object

Save one or many generated images to disk (Array from #generate_image or a single hash).
#send_request(messages, model = nil, temperature = nil, tools = nil, proc = nil) ⇒ Object

Chat / multimodal.
#synthesize_speech(text, voice: nil, model: nil, response_format: "wav", **opts) ⇒ Object

Returns a base64 data URL for the synthesized audio.
#synthesize_to_file(text, output_path, voice: nil, model: nil, response_format: "wav", **opts) ⇒ Object
#transcribe_audio(audio_file, model: nil, language: nil, **opts) ⇒ Object

Transcribe an audio file (local path).
#wait_for_video_completion(task_id, check_interval: 10, timeout: 600) ⇒ Object

Block until the task finishes (or times out), then return the video URL.

Constructor Details

#initialize(config) ⇒ `ZhipuAIAdapter`

Returns a new instance of ZhipuAIAdapter.

# File 'lib/smart_prompt/zhipu_adapter.rb', line 39

def initialize(config)
  super
  SmartPrompt.logger.info "Start create the SmartPrompt ZhipuAIAdapter."

  api_key = @config["api_key"]
  if api_key.is_a?(String) && api_key.start_with?("ENV[") && api_key.end_with?("]")
    api_key = eval(api_key)
  end
  # Match the other adapters: tolerate a missing key at construction so examples/config
  # can load without a live key; the first request fails with a clear auth error.
  SmartPrompt.logger.warn "Zhipu api_key is empty — API calls will fail until it is set." if api_key.nil? || api_key.to_s.strip.empty?

  @api_key     = api_key
  @base_url    = (@config["url"] || DEFAULT_BASE_URL).to_s.chomp("/")
  @coding_base = (@config["coding_url"] || DEFAULT_CODING_BASE_URL).to_s.chomp("/")
  # Optional per-method URL overrides (default to the standard paths off @base_url).
  @image_url  = (@config["image_url"]  || "#{@base_url}/images/generations").to_s
  @video_url  = (@config["video_url"]  || "#{@base_url}/videos/generations").to_s
  @query_url  = (@config["query_url"]  || "#{@base_url}/async-result").to_s
  SmartPrompt.logger.info "Zhipu base_url=#{@base_url}"
end

Instance Method Details

#check_video_status(task_id) ⇒ `Object`

Poll an async task. Returns the raw status hash (task_status etc.).

# File 'lib/smart_prompt/zhipu_adapter.rb', line 193

def check_video_status(task_id)
  SmartPrompt.logger.info "ZhipuAIAdapter: polling video task #{task_id}"
  http_get_json("#{@query_url}/#{URI.encode_www_form_component(task_id)}")
rescue LLMAPIError, Error
  raise
rescue => e
  raise LLMAPIError, "Failed to query Zhipu video task: #{e.message}"
end

#download_video(video_url, output_path) ⇒ `Object`

# File 'lib/smart_prompt/zhipu_adapter.rb', line 225

def download_video(video_url, output_path)
  uri = URI.parse(video_url)
  http = Net::HTTP.new(uri.host, uri.port); http.use_ssl = (uri.scheme == "https")
  response = http.request(Net::HTTP::Get.new(uri.request_uri))
  raise Error, "Failed to download video: #{response.code}" unless response.is_a?(Net::HTTPSuccess)
  FileUtils.mkdir_p(File.dirname(output_path))
  File.binwrite(output_path, response.body)
  SmartPrompt.logger.info "Zhipu video saved to #{output_path}"
  output_path
rescue => e
  raise e.is_a?(SmartPrompt::Error) ? e : Error, "Error downloading Zhipu video: #{e.message}"
end

#embeddings(text, model) ⇒ `Object`

embedding-3 (default 2048 dims); supports a custom ‘dimensions` (256/512/1024/2048) via config. Returns the first embedding vector.

# File 'lib/smart_prompt/zhipu_adapter.rb', line 96

def embeddings(text, model)
  model_name = model || @config["embedding_model"] || @config["model"]
  SmartPrompt.logger.info "ZhipuAIAdapter: embeddings model=#{model_name}"

  body = { "model" => model_name, "input" => text.is_a?(Array) ? text : [text.to_s] }
  body["dimensions"] = @config["dimensions"] if @config["dimensions"]
  body["encoding_format"] = @config["encoding_format"] if @config["encoding_format"]

  response =
    begin
      http_post_json("#{@base_url}/embeddings", body)
    rescue LLMAPIError, Error
      raise
    rescue => e
      raise LLMAPIError, "Failed to call Zhipu embeddings: #{e.message}"
    end

  items = response["data"]
  unless items.is_a?(Array) && items.any? && items[0]["embedding"]
    raise LLMAPIError, "No embedding vector in Zhipu response: #{response.inspect}"
  end
  items[0]["embedding"]
end

#generate_image(prompt, params = {}) ⇒ `Object`

Text-to-image. The Zhipu response is NESTED: data.images[].url (not OpenAI’s data[]), so we parse defensively. Returns an Array of b64_json:.

Raises:

(Error)

# File 'lib/smart_prompt/zhipu_adapter.rb', line 124

def generate_image(prompt, params = {})
  SmartPrompt.logger.info "ZhipuAIAdapter: generating image"
  raise Error, "Prompt cannot be empty" if prompt.nil? || prompt.to_s.strip.empty?

  model_name = params[:model] || @config["image_model"] || @config["model"]
  raise Error, "No model configured for image generation" if model_name.nil? || model_name.to_s.strip.empty?

  body = { "model" => model_name, "prompt" => prompt.to_s }
  body["size"]            = params[:size]            if params[:size]
  body["user"]            = params[:user]            if params[:user]
  body["response_format"] = params[:response_format] if params[:response_format]

  SmartPrompt.logger.info "Zhipu image params: #{body.except('prompt').inspect}"
  response =
    begin
      http_post_json(@image_url, body)
    rescue LLMAPIError, Error
      raise
    rescue => e
      raise Error, "Failed to call Zhipu image generation: #{e.message}"
    end

  images = parse_image_response(response)
  SmartPrompt.logger.info "ZhipuAIAdapter: generated #{images.size} image(s)"
  images
end

#generate_video(prompt, params = {}) ⇒ `Object`

Submit a text-to-video (or image-to-video) job. Returns the task id.

Raises:

(Error)

# File 'lib/smart_prompt/zhipu_adapter.rb', line 165

def generate_video(prompt, params = {})
  SmartPrompt.logger.info "ZhipuAIAdapter: submitting video job"
  model_name = params[:model] || @config["video_model"] || @config["model"]
  raise Error, "No model configured for video generation" if model_name.nil? || model_name.to_s.strip.empty?

  body = { "model" => model_name, "prompt" => prompt.to_s }
  %i[quality fps duration with_audio resolution request_id seed].each do |k|
    body[k.to_s] = params[k] unless params[k].nil?
  end
  body["image_url"] = normalize_image_url(params[:image_url]) if params[:image_url]

  SmartPrompt.logger.info "Zhipu video params: #{body.except('prompt').inspect}"
  response =
    begin
      http_post_json(@video_url, body)
    rescue LLMAPIError, Error
      raise
    rescue => e
      raise Error, "Failed to submit Zhipu video job: #{e.message}"
    end

  task_id = response["id"] || response["task_id"]
  raise LLMAPIError, "No task id in Zhipu video response: #{response.inspect}" unless task_id
  SmartPrompt.logger.info "ZhipuAIAdapter: video task #{task_id} submitted"
  { task_id: task_id, model: model_name, raw: response }
end

#rerank(query, documents, model: nil) ⇒ `Object`

—- rerank (bonus) ——————————————————

# File 'lib/smart_prompt/zhipu_adapter.rb', line 293

def rerank(query, documents, model: nil)
  model_name = model || @config["rerank_model"] || @config["model"]
  body = { "model" => model_name, "query" => query, "documents" => documents }
  response = http_post_json("#{@base_url}/rerank", body)
  (response["results"] || []).map { |r| { index: r["index"], relevance_score: r["relevance_score"] || r["score"] } }
rescue LLMAPIError, Error
  raise
rescue => e
  raise LLMAPIError, "Failed to call Zhipu rerank: #{e.message}"
end

#save_image(image_data, output_dir = "./output", filename_prefix = "zhipu_image") ⇒ `Object`

Save one or many generated images to disk (Array from #generate_image or a single hash).

# File 'lib/smart_prompt/zhipu_adapter.rb', line 152

def save_image(image_data, output_dir = "./output", filename_prefix = "zhipu_image")
  FileUtils.mkdir_p(output_dir)
  images = image_data.is_a?(Array) ? image_data : [image_data]
  saved = images.each_with_index.map do |img, index|
    save_single_image(img, output_dir, "#{filename_prefix}_#{index + 1}")
  end
  SmartPrompt.logger.info "Saved #{saved.size} Zhipu image(s) to #{output_dir}"
  saved
end

#send_request(messages, model = nil, temperature = nil, tools = nil, proc = nil) ⇒ `Object`

Chat / multimodal. Non-streaming returns a full OpenAI-format hash (so last_response carries usage + reasoning_content); streaming calls proc with each OpenAI-shaped chunk.

# File 'lib/smart_prompt/zhipu_adapter.rb', line 67

def send_request(messages, model = nil, temperature = nil, tools = nil, proc = nil)
  model_name = model || @config["model"]
  body = build_chat_body(messages, model_name, temperature, tools)
  SmartPrompt.logger.info "ZhipuAIAdapter: chat request model=#{model_name} stream=#{!proc.nil?}"

  url = chat_url_for(model_name)
  if proc
    body["stream"] = true
    stream_chat(url, body) { |data| proc.call(build_stream_chunk(data), 0) }
    SmartPrompt.logger.info "ZhipuAIAdapter: streaming request finished"
    nil
  else
    raw = http_post_json(url, body)
    response = build_completion_response(raw)
    @last_response = response
    SmartPrompt.logger.info "ZhipuAIAdapter: received chat response"
    response
  end
rescue LLMAPIError, Error
  raise
rescue => e
  SmartPrompt.logger.error "Zhipu chat error: #{e.message}"
  raise LLMAPIError, "Failed to call Zhipu chat: #{e.message}"
end

#synthesize_speech(text, voice: nil, model: nil, response_format: "wav", **opts) ⇒ `Object`

Returns a base64 data URL for the synthesized audio. GLM-TTS accepts wav/pcm only (mp3/flac are rejected), so default to wav.

# File 'lib/smart_prompt/zhipu_adapter.rb', line 242

def synthesize_speech(text, voice: nil, model: nil, response_format: "wav", **opts)
  SmartPrompt.logger.info "ZhipuAIAdapter: TTS"
  raise Error, "Text cannot be empty" if text.nil? || text.to_s.strip.empty?

  model_name = model || @config["tts_model"] || "glm-tts"
  body = { "model" => model_name, "input" => text.to_s }
  body["voice"] = voice if voice
  body["response_format"] = response_format
  body["speed"] = opts[:speed] if opts[:speed]
  body["emotion"] = opts[:emotion] if opts[:emotion]

  audio = http_post_binary("#{@base_url}/audio/speech", body)
  "data:audio/#{response_format};base64,#{Base64.strict_encode64(audio)}"
rescue LLMAPIError, Error
  raise
rescue => e
  raise Error, "Failed to call Zhipu TTS: #{e.message}"
end

#synthesize_to_file(text, output_path, voice: nil, model: nil, response_format: "wav", **opts) ⇒ `Object`

# File 'lib/smart_prompt/zhipu_adapter.rb', line 261

def synthesize_to_file(text, output_path, voice: nil, model: nil, response_format: "wav", **opts)
  data_url = synthesize_speech(text, voice: voice, model: model, response_format: response_format, **opts)
  FileUtils.mkdir_p(File.dirname(output_path))
  audio_bytes = Base64.decode64(data_url.sub(/\Adata:audio\/\w+;base64,/, ""))
  File.binwrite(output_path, audio_bytes)
  SmartPrompt.logger.info "Zhipu audio saved to #{output_path}"
  { file_path: output_path, format: response_format }
end

#transcribe_audio(audio_file, model: nil, language: nil, **opts) ⇒ `Object`

Transcribe an audio file (local path). Returns text:.

# File 'lib/smart_prompt/zhipu_adapter.rb', line 273

def transcribe_audio(audio_file, model: nil, language: nil, **opts)
  SmartPrompt.logger.info "ZhipuAIAdapter: ASR #{File.basename(audio_file)}"
  raise Error, "Audio file not found: #{audio_file}" unless File.exist?(audio_file)

  model_name = model || @config["asr_model"] || "glm-asr-2512"
  form = { "model" => model_name }
  form["language"] = language if language
  form["prompt"] = opts[:prompt] if opts[:prompt]
  form["response_format"] = opts[:response_format] if opts[:response_format]

  response = http_post_multipart("#{@base_url}/audio/transcriptions", form, audio_file)
  { text: response["text"] }
rescue LLMAPIError, Error
  raise
rescue => e
  raise e.is_a?(SmartPrompt::Error) ? e : Error, "Failed to call Zhipu ASR: #{e.message}"
end

#wait_for_video_completion(task_id, check_interval: 10, timeout: 600) ⇒ `Object`

Block until the task finishes (or times out), then return the video URL.

# File 'lib/smart_prompt/zhipu_adapter.rb', line 203

def wait_for_video_completion(task_id, check_interval: 10, timeout: 600)
  start = Time.now
  loop do
    status = check_video_status(task_id)
    case task_status_of(status)
    when "SUCCESS"
      url = video_url_of(status)
      raise LLMAPIError, "Video succeeded but no url in: #{status.inspect}" unless url
      SmartPrompt.logger.info "ZhipuAIAdapter: video ready #{url}"
      return { task_id: task_id, status: "SUCCESS", video_url: url, cover_image_url: cover_url_of(status), raw: status }
    when "FAIL", "FAILED"
      raise LLMAPIError, "Zhipu video generation failed: #{status.inspect}"
    else
      if Time.now - start > timeout
        raise LLMAPIError, "Zhipu video generation timeout after #{timeout}s"
      end
      SmartPrompt.logger.info "Zhipu video task #{task_id} still processing..."
      sleep(check_interval)
    end
  end
end

Class: SmartPrompt::ZhipuAIAdapter

Overview

Constant Summary collapse

Instance Attribute Summary

Attributes inherited from LLMAdapter

Instance Method Summary collapse

Constructor Details

#initialize(config) ⇒ ZhipuAIAdapter

Instance Method Details

#check_video_status(task_id) ⇒ Object

#download_video(video_url, output_path) ⇒ Object

#embeddings(text, model) ⇒ Object

#generate_image(prompt, params = {}) ⇒ Object

#generate_video(prompt, params = {}) ⇒ Object

#rerank(query, documents, model: nil) ⇒ Object

#save_image(image_data, output_dir = "./output", filename_prefix = "zhipu_image") ⇒ Object

#send_request(messages, model = nil, temperature = nil, tools = nil, proc = nil) ⇒ Object

#synthesize_speech(text, voice: nil, model: nil, response_format: "wav", **opts) ⇒ Object

#synthesize_to_file(text, output_path, voice: nil, model: nil, response_format: "wav", **opts) ⇒ Object

#transcribe_audio(audio_file, model: nil, language: nil, **opts) ⇒ Object

#wait_for_video_completion(task_id, check_interval: 10, timeout: 600) ⇒ Object