Class: RubyLLM::Agents::Speaker

Inherits:

BaseAgent

Object
BaseAgent
RubyLLM::Agents::Speaker

show all

Defined in:: lib/ruby_llm/agents/audio/speaker.rb,
lib/ruby_llm/agents/audio/speaker/active_storage_support.rb

Overview

Base class for creating text-to-speech speakers using the middleware pipeline

Speaker provides a DSL for configuring text-to-audio operations with built-in execution tracking, budget controls, and multi-tenancy support through the middleware pipeline.

Examples:

Basic usage

class ArticleNarrator < RubyLLM::Agents::Speaker
  provider :openai
  model 'tts-1-hd'
  voice 'nova'
end

result = ArticleNarrator.call(text: "Hello world")
result.audio       # => Binary audio data
result.save_to("output.mp3")

With voice settings

class PremiumNarrator < RubyLLM::Agents::Speaker
  provider :elevenlabs
  model 'eleven_multilingual_v2'
  voice 'Rachel'

  voice_settings do
    stability 0.5
    similarity_boost 0.75
  end
end

Defined Under Namespace

Modules: ActiveStorageSupport Classes: Lexicon, VoiceSettings

Constant Summary

Instance Attribute Summary collapse

#text ⇒ Object readonly

Attributes inherited from BaseAgent

#client, #model, #temperature, #tracked_tool_calls

Speaker-specific DSL collapse

.model(value = nil) ⇒ String

Sets or returns the TTS model.
.output_format(value = nil) ⇒ Symbol

Sets or returns the output format.
.provider(value = nil) ⇒ Symbol

Sets or returns the TTS provider.
.speed(value = nil) ⇒ Float

Sets or returns the speech speed.
.streaming(value = nil) ⇒ Boolean

Sets or returns streaming mode.
.streaming? ⇒ Boolean
.voice(value = nil) ⇒ String

Sets or returns the voice name.
.voice_id(value = nil) ⇒ String^?

Sets or returns the voice ID (for custom/cloned voices).

Voice Settings DSL collapse

.voice_settings { ... } ⇒ VoiceSettings

Configures voice settings (ElevenLabs specific).
.voice_settings_config ⇒ Object

Lexicon DSL collapse

.lexicon { ... } ⇒ Lexicon

Configures pronunciation lexicon.
.lexicon_config ⇒ Object

Class Method Summary collapse

.agent_type ⇒ Symbol

Returns the agent type for speakers.
.call(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult

Factory method to instantiate and execute speaker.
.stream(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult

Streams the speaker output.

Instance Method Summary collapse

#agent_cache_key ⇒ String

Generates the cache key for this speech.
#call {|audio_chunk| ... } ⇒ SpeechResult

Executes the speech through the middleware pipeline.
#execute(context) ⇒ void

Core speech execution.
#initialize(text:, **options) ⇒ Speaker constructor

Creates a new Speaker instance.
#user_prompt ⇒ String

The input for this speech operation.

Methods inherited from BaseAgent

agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware

Methods included from DSL::Base

#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #prompt, #returns, #schema, #system, #system_config, #timeout, #user, #user_config

Methods included from DSL::Reliability

#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout

Methods included from DSL::Caching

#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config

Methods included from DSL::Queryable

#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params

Methods included from DSL::Knowledge

#knowledge_entries, #knowledge_path, #knows

Methods included from CacheHelper

#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write

Methods included from DSL::Knowledge::InstanceMethods

#compiled_knowledge

Constructor Details

#initialize(text:, **options) ⇒ `Speaker`

Creates a new Speaker instance

Parameters:

text (String) —

Text to convert to speech
options (Hash) —

Configuration options

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 287

def initialize(text:, **options)
  @text = text
  @streaming_block = nil
  @runtime_streaming = options.delete(:streaming)

  # Set model to TTS model if not specified
  options[:model] ||= self.class.model

  super(**options)
end

Instance Attribute Details

#text ⇒ `Object` (readonly)



281
282
283

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 281

def text
  @text
end

Class Method Details

.agent_type ⇒ `Symbol`

Returns the agent type for speakers

Returns:

(Symbol) —

:audio



46
47
48

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 46

def agent_type
  :audio
end

.call(text:, **options) {|audio_chunk| ... } ⇒ `SpeechResult`

Factory method to instantiate and execute speaker

Parameters:

text (String) —

Text to convert to speech
options (Hash) —

Additional options

Yields:

(audio_chunk) —

Called for each audio chunk when streaming

Returns:

(SpeechResult) —

The speech result



175
176
177

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 175

def call(text:, **options, &block)
  new(text: text, **options).call(&block)
end

.lexicon { ... } ⇒ `Lexicon`

Configures pronunciation lexicon

Yields:

Block for configuring pronunciations

Returns:

(Lexicon) —

The lexicon configuration

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 157

def lexicon(&block)
  @lexicon ||= Lexicon.new
  @lexicon.instance_eval(&block) if block_given?
  @lexicon
end

.lexicon_config ⇒ `Object`



163
164
165

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 163

def lexicon_config
  @lexicon || inherited_or_default(:lexicon_config, nil)
end

.model(value = nil) ⇒ `String`

Sets or returns the TTS model

Parameters:

value (String, nil) (defaults to: nil) —

The model identifier

Returns:

(String) —

The current model setting

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 71

def model(value = nil)
  @model = value if value
  return @model if defined?(@model) && @model

  if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio
    superclass.model
  else
    default_tts_model
  end
end

.output_format(value = nil) ⇒ `Symbol`

Sets or returns the output format

Parameters:

value (Symbol, nil) (defaults to: nil) —

Format (:mp3, :wav, :ogg, etc.)

Returns:

(Symbol) —

The current format

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 113

def output_format(value = nil)
  @output_format = value if value
  @output_format || inherited_or_default(:output_format, :mp3)
end

.provider(value = nil) ⇒ `Symbol`

Sets or returns the TTS provider

Parameters:

value (Symbol, nil) (defaults to: nil) —

The provider (:openai, :elevenlabs, :google, :polly)

Returns:

(Symbol) —

The current provider setting

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 56

def provider(value = nil)
  @provider = value if value
  return @provider if defined?(@provider) && @provider

  if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio
    superclass.provider
  else
    default_tts_provider
  end
end

.speed(value = nil) ⇒ `Float`

Sets or returns the speech speed

Parameters:

value (Float, nil) (defaults to: nil) —

Speed multiplier

Returns:

(Float) —

The current speed

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 104

def speed(value = nil)
  @speed = value if value
  @speed || inherited_or_default(:speed, 1.0)
end

.stream(text:, **options) {|audio_chunk| ... } ⇒ `SpeechResult`

Streams the speaker output

Parameters:

text (String) —

Text to convert to speech
options (Hash) —

Additional options

Yields:

(audio_chunk) —

Called for each audio chunk

Returns:

(SpeechResult) —

The speech result

Raises:

(ArgumentError)

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 185

def stream(text:, **options, &block)
  raise ArgumentError, "A block is required for streaming" unless block_given?

  instance = new(text: text, **options.merge(streaming: true))
  instance.call(&block)
end

.streaming(value = nil) ⇒ `Boolean`

Sets or returns streaming mode

Parameters:

value (Boolean, nil) (defaults to: nil) —

Enable streaming

Returns:

(Boolean) —

The current streaming setting

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 122

def streaming(value = nil)
  @streaming = value unless value.nil?
  instance_variable_defined?(:@streaming) ? @streaming : inherited_or_default(:streaming, false)
end

.streaming? ⇒ `Boolean`

Returns:

(Boolean)



127
128
129

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 127

def streaming?
  streaming
end

.voice(value = nil) ⇒ `String`

Sets or returns the voice name

Parameters:

value (String, nil) (defaults to: nil) —

The voice name

Returns:

(String) —

The current voice setting

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 86

def voice(value = nil)
  @voice = value if value
  @voice || inherited_or_default(:voice, default_tts_voice)
end

.voice_id(value = nil) ⇒ `String`^?

Sets or returns the voice ID (for custom/cloned voices)

Parameters:

value (String, nil) (defaults to: nil) —

The voice ID

Returns:

(String, nil) —

The current voice ID

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 95

def voice_id(value = nil)
  @voice_id = value if value
  @voice_id || inherited_or_default(:voice_id, nil)
end

.voice_settings { ... } ⇒ `VoiceSettings`

Configures voice settings (ElevenLabs specific)

Yields:

Block for configuring voice settings

Returns:

(VoiceSettings) —

The voice settings configuration

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 139

def voice_settings(&block)
  @voice_settings ||= VoiceSettings.new
  @voice_settings.instance_eval(&block) if block_given?
  @voice_settings
end

.voice_settings_config ⇒ `Object`



145
146
147

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 145

def voice_settings_config
  @voice_settings || inherited_or_default(:voice_settings_config, nil)
end

Instance Method Details

#agent_cache_key ⇒ `String`

Generates the cache key for this speech

Returns:

(String) —

Cache key

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 363

def agent_cache_key
  components = [
    "ruby_llm_agents",
    "speech",
    self.class.name,
    resolved_provider,
    resolved_model,
    resolved_voice,
    resolved_voice_id,
    resolved_speed,
    resolved_output_format,
    Digest::SHA256.hexdigest(text)
  ].compact

  components.join("/")
end

#call {|audio_chunk| ... } ⇒ `SpeechResult`

Executes the speech through the middleware pipeline

Yields:

(audio_chunk) —

Called for each audio chunk when streaming

Returns:

(SpeechResult) —

The speech result

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 302

def call(&block)
  @streaming_block = block
  context = build_context
  result_context = Pipeline::Executor.execute(context)
  result_context.output
end

#execute(context) ⇒ `void`

This method returns an undefined value.

Core speech execution

This is called by the Pipeline::Executor after middleware has been applied. Only contains the speech API logic.

Parameters:

context (Pipeline::Context) —

The execution context

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 323

def execute(context)
  execution_started_at = Time.current

  validate_text_input!
  processed_text = apply_lexicon(text)

  # Execute speech synthesis
  result = execute_speech(processed_text)

  execution_completed_at = Time.current
  duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i

  # Update context
  context.input_tokens = 0
  context.output_tokens = 0
  context.total_cost = calculate_cost(result)

  # Store audio-specific metadata for execution tracking
  context[:provider] = result[:provider].to_s
  context[:voice_id] = (resolved_voice_id || resolved_voice).to_s
  context[:characters] = result[:characters]
  context[:output_format] = result[:format].to_s
  context[:file_size] = result[:audio]&.bytesize
  context[:audio_duration_seconds] = result[:duration] if result[:duration]

  # Build final result
  context.output = build_result(
    result,
    text,
    started_at: context.started_at || execution_started_at,
    completed_at: execution_completed_at,
    duration_ms: duration_ms,
    tenant_id: context.tenant_id,
    execution_id: context.execution_id
  )
end

#user_prompt ⇒ `String`

The input for this speech operation

Returns:

(String) —

The text being converted



312
313
314

# File 'lib/ruby_llm/agents/audio/speaker.rb', line 312

def user_prompt
  text
end

Class: RubyLLM::Agents::Speaker

Overview

Examples:

Basic usage

With voice settings

Defined Under Namespace

Constant Summary

Constants included from DSL::Base

Constants included from DSL::Caching

Constants included from CacheHelper

Instance Attribute Summary collapse

Attributes inherited from BaseAgent

Speaker-specific DSL collapse

Voice Settings DSL collapse

Lexicon DSL collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from BaseAgent

Methods included from DSL::Base

Methods included from DSL::Reliability

Methods included from DSL::Caching

Methods included from DSL::Queryable

Methods included from DSL::Knowledge

Methods included from CacheHelper

Methods included from DSL::Knowledge::InstanceMethods

Constructor Details

#initialize(text:, **options) ⇒ Speaker

Instance Attribute Details

#text ⇒ Object (readonly)

Class Method Details

.agent_type ⇒ Symbol

.call(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult

.lexicon { ... } ⇒ Lexicon

.lexicon_config ⇒ Object

.model(value = nil) ⇒ String

.output_format(value = nil) ⇒ Symbol

.provider(value = nil) ⇒ Symbol

.speed(value = nil) ⇒ Float

.stream(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult

.streaming(value = nil) ⇒ Boolean

.streaming? ⇒ Boolean

.voice(value = nil) ⇒ String

.voice_id(value = nil) ⇒ String?

.voice_settings { ... } ⇒ VoiceSettings

.voice_settings_config ⇒ Object

Instance Method Details

#agent_cache_key ⇒ String

#call {|audio_chunk| ... } ⇒ SpeechResult

#execute(context) ⇒ void

#user_prompt ⇒ String

#initialize(text:, **options) ⇒ `Speaker`

#text ⇒ `Object` (readonly)

.agent_type ⇒ `Symbol`

.call(text:, **options) {|audio_chunk| ... } ⇒ `SpeechResult`

.lexicon { ... } ⇒ `Lexicon`

.lexicon_config ⇒ `Object`

.model(value = nil) ⇒ `String`

.output_format(value = nil) ⇒ `Symbol`

.provider(value = nil) ⇒ `Symbol`

.speed(value = nil) ⇒ `Float`

.stream(text:, **options) {|audio_chunk| ... } ⇒ `SpeechResult`

.streaming(value = nil) ⇒ `Boolean`

.streaming? ⇒ `Boolean`

.voice(value = nil) ⇒ `String`

.voice_id(value = nil) ⇒ `String`^?

.voice_settings { ... } ⇒ `VoiceSettings`

.voice_settings_config ⇒ `Object`

#agent_cache_key ⇒ `String`

#call {|audio_chunk| ... } ⇒ `SpeechResult`

#execute(context) ⇒ `void`

#user_prompt ⇒ `String`