Class: RubyLLM::Agents::Speaker

Inherits:
BaseAgent show all
Defined in:
lib/ruby_llm/agents/audio/speaker.rb,
lib/ruby_llm/agents/audio/speaker/active_storage_support.rb

Overview

Base class for creating text-to-speech speakers using the middleware pipeline

Speaker provides a DSL for configuring text-to-audio operations with built-in execution tracking, budget controls, and multi-tenancy support through the middleware pipeline.

Examples:

Basic usage

class ArticleNarrator < RubyLLM::Agents::Speaker
  provider :openai
  model 'tts-1-hd'
  voice 'nova'
end

result = ArticleNarrator.call(text: "Hello world")
result.audio       # => Binary audio data
result.save_to("output.mp3")

With voice settings

class PremiumNarrator < RubyLLM::Agents::Speaker
  provider :elevenlabs
  model 'eleven_multilingual_v2'
  voice 'Rachel'

  voice_settings do
    stability 0.5
    similarity_boost 0.75
  end
end

Defined Under Namespace

Modules: ActiveStorageSupport Classes: Lexicon, VoiceSettings

Constant Summary

Constants included from DSL::Base

DSL::Base::PLACEHOLDER_PATTERN

Constants included from DSL::Caching

DSL::Caching::DEFAULT_CACHE_TTL

Constants included from CacheHelper

CacheHelper::NAMESPACE

Instance Attribute Summary collapse

Attributes inherited from BaseAgent

#client, #model, #temperature, #tracked_tool_calls

Speaker-specific DSL collapse

Voice Settings DSL collapse

Lexicon DSL collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from BaseAgent

agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware

Methods included from DSL::Base

#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #prompt, #returns, #schema, #system, #system_config, #timeout, #user, #user_config

Methods included from DSL::Reliability

#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout

Methods included from DSL::Caching

#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config

Methods included from DSL::Queryable

#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params

Methods included from DSL::Knowledge

#knowledge_entries, #knowledge_path, #knows

Methods included from CacheHelper

#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write

Methods included from DSL::Knowledge::InstanceMethods

#compiled_knowledge

Constructor Details

#initialize(text:, **options) ⇒ Speaker

Creates a new Speaker instance

Parameters:

  • text (String)

    Text to convert to speech

  • options (Hash)

    Configuration options



287
288
289
290
291
292
293
294
295
296
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 287

def initialize(text:, **options)
  @text = text
  @streaming_block = nil
  @runtime_streaming = options.delete(:streaming)

  # Set model to TTS model if not specified
  options[:model] ||= self.class.model

  super(**options)
end

Instance Attribute Details

#textObject (readonly)



281
282
283
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 281

def text
  @text
end

Class Method Details

.agent_typeSymbol

Returns the agent type for speakers

Returns:

  • (Symbol)

    :audio



46
47
48
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 46

def agent_type
  :audio
end

.call(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult

Factory method to instantiate and execute speaker

Parameters:

  • text (String)

    Text to convert to speech

  • options (Hash)

    Additional options

Yields:

  • (audio_chunk)

    Called for each audio chunk when streaming

Returns:



175
176
177
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 175

def call(text:, **options, &block)
  new(text: text, **options).call(&block)
end

.lexicon { ... } ⇒ Lexicon

Configures pronunciation lexicon

Yields:

  • Block for configuring pronunciations

Returns:

  • (Lexicon)

    The lexicon configuration



157
158
159
160
161
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 157

def lexicon(&block)
  @lexicon ||= Lexicon.new
  @lexicon.instance_eval(&block) if block_given?
  @lexicon
end

.lexicon_configObject



163
164
165
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 163

def lexicon_config
  @lexicon || inherited_or_default(:lexicon_config, nil)
end

.model(value = nil) ⇒ String

Sets or returns the TTS model

Parameters:

  • value (String, nil) (defaults to: nil)

    The model identifier

Returns:

  • (String)

    The current model setting



71
72
73
74
75
76
77
78
79
80
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 71

def model(value = nil)
  @model = value if value
  return @model if defined?(@model) && @model

  if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio
    superclass.model
  else
    default_tts_model
  end
end

.output_format(value = nil) ⇒ Symbol

Sets or returns the output format

Parameters:

  • value (Symbol, nil) (defaults to: nil)

    Format (:mp3, :wav, :ogg, etc.)

Returns:

  • (Symbol)

    The current format



113
114
115
116
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 113

def output_format(value = nil)
  @output_format = value if value
  @output_format || inherited_or_default(:output_format, :mp3)
end

.provider(value = nil) ⇒ Symbol

Sets or returns the TTS provider

Parameters:

  • value (Symbol, nil) (defaults to: nil)

    The provider (:openai, :elevenlabs, :google, :polly)

Returns:

  • (Symbol)

    The current provider setting



56
57
58
59
60
61
62
63
64
65
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 56

def provider(value = nil)
  @provider = value if value
  return @provider if defined?(@provider) && @provider

  if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio
    superclass.provider
  else
    default_tts_provider
  end
end

.speed(value = nil) ⇒ Float

Sets or returns the speech speed

Parameters:

  • value (Float, nil) (defaults to: nil)

    Speed multiplier

Returns:

  • (Float)

    The current speed



104
105
106
107
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 104

def speed(value = nil)
  @speed = value if value
  @speed || inherited_or_default(:speed, 1.0)
end

.stream(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult

Streams the speaker output

Parameters:

  • text (String)

    Text to convert to speech

  • options (Hash)

    Additional options

Yields:

  • (audio_chunk)

    Called for each audio chunk

Returns:

Raises:

  • (ArgumentError)


185
186
187
188
189
190
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 185

def stream(text:, **options, &block)
  raise ArgumentError, "A block is required for streaming" unless block_given?

  instance = new(text: text, **options.merge(streaming: true))
  instance.call(&block)
end

.streaming(value = nil) ⇒ Boolean

Sets or returns streaming mode

Parameters:

  • value (Boolean, nil) (defaults to: nil)

    Enable streaming

Returns:

  • (Boolean)

    The current streaming setting



122
123
124
125
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 122

def streaming(value = nil)
  @streaming = value unless value.nil?
  instance_variable_defined?(:@streaming) ? @streaming : inherited_or_default(:streaming, false)
end

.streaming?Boolean

Returns:

  • (Boolean)


127
128
129
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 127

def streaming?
  streaming
end

.voice(value = nil) ⇒ String

Sets or returns the voice name

Parameters:

  • value (String, nil) (defaults to: nil)

    The voice name

Returns:

  • (String)

    The current voice setting



86
87
88
89
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 86

def voice(value = nil)
  @voice = value if value
  @voice || inherited_or_default(:voice, default_tts_voice)
end

.voice_id(value = nil) ⇒ String?

Sets or returns the voice ID (for custom/cloned voices)

Parameters:

  • value (String, nil) (defaults to: nil)

    The voice ID

Returns:

  • (String, nil)

    The current voice ID



95
96
97
98
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 95

def voice_id(value = nil)
  @voice_id = value if value
  @voice_id || inherited_or_default(:voice_id, nil)
end

.voice_settings { ... } ⇒ VoiceSettings

Configures voice settings (ElevenLabs specific)

Yields:

  • Block for configuring voice settings

Returns:



139
140
141
142
143
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 139

def voice_settings(&block)
  @voice_settings ||= VoiceSettings.new
  @voice_settings.instance_eval(&block) if block_given?
  @voice_settings
end

.voice_settings_configObject



145
146
147
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 145

def voice_settings_config
  @voice_settings || inherited_or_default(:voice_settings_config, nil)
end

Instance Method Details

#agent_cache_keyString

Generates the cache key for this speech

Returns:

  • (String)

    Cache key



363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 363

def agent_cache_key
  components = [
    "ruby_llm_agents",
    "speech",
    self.class.name,
    resolved_provider,
    resolved_model,
    resolved_voice,
    resolved_voice_id,
    resolved_speed,
    resolved_output_format,
    Digest::SHA256.hexdigest(text)
  ].compact

  components.join("/")
end

#call {|audio_chunk| ... } ⇒ SpeechResult

Executes the speech through the middleware pipeline

Yields:

  • (audio_chunk)

    Called for each audio chunk when streaming

Returns:



302
303
304
305
306
307
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 302

def call(&block)
  @streaming_block = block
  context = build_context
  result_context = Pipeline::Executor.execute(context)
  result_context.output
end

#execute(context) ⇒ void

This method returns an undefined value.

Core speech execution

This is called by the Pipeline::Executor after middleware has been applied. Only contains the speech API logic.

Parameters:



323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 323

def execute(context)
  execution_started_at = Time.current

  validate_text_input!
  processed_text = apply_lexicon(text)

  # Execute speech synthesis
  result = execute_speech(processed_text)

  execution_completed_at = Time.current
  duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i

  # Update context
  context.input_tokens = 0
  context.output_tokens = 0
  context.total_cost = calculate_cost(result)

  # Store audio-specific metadata for execution tracking
  context[:provider] = result[:provider].to_s
  context[:voice_id] = (resolved_voice_id || resolved_voice).to_s
  context[:characters] = result[:characters]
  context[:output_format] = result[:format].to_s
  context[:file_size] = result[:audio]&.bytesize
  context[:audio_duration_seconds] = result[:duration] if result[:duration]

  # Build final result
  context.output = build_result(
    result,
    text,
    started_at: context.started_at || execution_started_at,
    completed_at: execution_completed_at,
    duration_ms: duration_ms,
    tenant_id: context.tenant_id,
    execution_id: context.execution_id
  )
end

#user_promptString

The input for this speech operation

Returns:

  • (String)

    The text being converted



312
313
314
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 312

def user_prompt
  text
end