Class: RubyLLM::Agents::Audio::SpeechClient

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby_llm/agents/audio/speech_client.rb

Overview

Direct HTTP client for text-to-speech APIs.

Supports OpenAI and ElevenLabs providers, bypassing the need for a RubyLLM.speak() method that does not exist in the base gem.

Examples:

OpenAI

client = SpeechClient.new(provider: :openai)
response = client.speak("Hello", model: "tts-1", voice: "nova")
response.audio  # => binary audio data

ElevenLabs

client = SpeechClient.new(provider: :elevenlabs)
response = client.speak("Hello",
  model: "eleven_v3",
  voice: "Rachel",
  voice_id: "21m00Tcm4TlvDq8ikWAM",
  voice_settings: { stability: 0.5, similarity_boost: 0.75 }
)

Defined Under Namespace

Classes: Response, StreamChunk

Constant Summary collapse

SUPPORTED_PROVIDERS =
%i[openai elevenlabs].freeze

Instance Method Summary collapse

Constructor Details

#initialize(provider:) ⇒ SpeechClient

Returns a new instance of SpeechClient.

Parameters:

  • provider (Symbol)

    :openai or :elevenlabs

Raises:



46
47
48
49
# File 'lib/ruby_llm/agents/audio/speech_client.rb', line 46

def initialize(provider:)
  validate_provider!(provider)
  @provider = provider
end

Instance Method Details

#speak(text, model:, voice:, voice_id: nil, speed: nil, response_format: "mp3", voice_settings: nil) ⇒ Response

Synthesize speech (non-streaming)

Parameters:

  • text (String)

    text to convert

  • model (String)

    model identifier

  • voice (String)

    voice name

  • voice_id (String, nil) (defaults to: nil)

    voice ID (required for ElevenLabs)

  • speed (Float, nil) (defaults to: nil)

    speed multiplier

  • response_format (String) (defaults to: "mp3")

    output format

  • voice_settings (Hash, nil) (defaults to: nil)

    ElevenLabs voice settings

Returns:



61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/ruby_llm/agents/audio/speech_client.rb', line 61

def speak(text, model:, voice:, voice_id: nil, speed: nil,
  response_format: "mp3", voice_settings: nil)
  case @provider
  when :openai
    openai_speak(text, model: model, voice: voice_id || voice,
      speed: speed, response_format: response_format)
  when :elevenlabs
    elevenlabs_speak(text, model: model, voice_id: voice_id || voice,
      speed: speed, response_format: response_format,
      voice_settings: voice_settings)
  end
end

#speak_streaming(text, model:, voice:, voice_id: nil, speed: nil, response_format: "mp3", voice_settings: nil) {|StreamChunk| ... } ⇒ Response

Synthesize speech with streaming

Parameters:

  • text (String)

    text to convert

  • model (String)

    model identifier

  • voice (String)

    voice name

  • voice_id (String, nil) (defaults to: nil)

    voice ID

  • speed (Float, nil) (defaults to: nil)

    speed multiplier

  • response_format (String) (defaults to: "mp3")

    output format

  • voice_settings (Hash, nil) (defaults to: nil)

    ElevenLabs voice settings

Yields:

Returns:



85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/ruby_llm/agents/audio/speech_client.rb', line 85

def speak_streaming(text, model:, voice:, voice_id: nil, speed: nil,
  response_format: "mp3", voice_settings: nil, &block)
  case @provider
  when :openai
    openai_speak_streaming(text, model: model, voice: voice_id || voice,
                           speed: speed, response_format: response_format,
      &block)
  when :elevenlabs
    elevenlabs_speak_streaming(text, model: model,
                               voice_id: voice_id || voice,
                               speed: speed,
                               response_format: response_format,
                               voice_settings: voice_settings, &block)
  end
end