Class: RubyLLM::Agents::Speaker
- Defined in:
- lib/ruby_llm/agents/audio/speaker.rb,
lib/ruby_llm/agents/audio/speaker/active_storage_support.rb
Overview
Base class for creating text-to-speech speakers using the middleware pipeline
Speaker provides a DSL for configuring text-to-audio operations with built-in execution tracking, budget controls, and multi-tenancy support through the middleware pipeline.
Defined Under Namespace
Modules: ActiveStorageSupport Classes: Lexicon, VoiceSettings
Constant Summary
Constants included from DSL::Base
DSL::Base::PLACEHOLDER_PATTERN
Constants included from DSL::Caching
DSL::Caching::DEFAULT_CACHE_TTL
Constants included from CacheHelper
Instance Attribute Summary collapse
- #text ⇒ Object readonly
Attributes inherited from BaseAgent
#client, #model, #temperature, #tracked_tool_calls
Speaker-specific DSL collapse
-
.model(value = nil) ⇒ String
Sets or returns the TTS model.
-
.output_format(value = nil) ⇒ Symbol
Sets or returns the output format.
-
.provider(value = nil) ⇒ Symbol
Sets or returns the TTS provider.
-
.speed(value = nil) ⇒ Float
Sets or returns the speech speed.
-
.streaming(value = nil) ⇒ Boolean
Sets or returns streaming mode.
- .streaming? ⇒ Boolean
-
.voice(value = nil) ⇒ String
Sets or returns the voice name.
-
.voice_id(value = nil) ⇒ String?
Sets or returns the voice ID (for custom/cloned voices).
Voice Settings DSL collapse
-
.voice_settings { ... } ⇒ VoiceSettings
Configures voice settings (ElevenLabs specific).
- .voice_settings_config ⇒ Object
Lexicon DSL collapse
-
.lexicon { ... } ⇒ Lexicon
Configures pronunciation lexicon.
- .lexicon_config ⇒ Object
Class Method Summary collapse
-
.agent_type ⇒ Symbol
Returns the agent type for speakers.
-
.call(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult
Factory method to instantiate and execute speaker.
-
.stream(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult
Streams the speaker output.
Instance Method Summary collapse
-
#agent_cache_key ⇒ String
Generates the cache key for this speech.
-
#call {|audio_chunk| ... } ⇒ SpeechResult
Executes the speech through the middleware pipeline.
-
#execute(context) ⇒ void
Core speech execution.
-
#initialize(text:, **options) ⇒ Speaker
constructor
Creates a new Speaker instance.
-
#user_prompt ⇒ String
The input for this speech operation.
Methods inherited from BaseAgent
agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware
Methods included from DSL::Base
#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #prompt, #returns, #schema, #system, #system_config, #timeout, #user, #user_config
Methods included from DSL::Reliability
#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout
Methods included from DSL::Caching
#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config
Methods included from DSL::Queryable
#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params
Methods included from DSL::Knowledge
#knowledge_entries, #knowledge_path, #knows
Methods included from CacheHelper
#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write
Methods included from DSL::Knowledge::InstanceMethods
Constructor Details
#initialize(text:, **options) ⇒ Speaker
Creates a new Speaker instance
287 288 289 290 291 292 293 294 295 296 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 287 def initialize(text:, **) @text = text @streaming_block = nil @runtime_streaming = .delete(:streaming) # Set model to TTS model if not specified [:model] ||= self.class.model super(**) end |
Instance Attribute Details
#text ⇒ Object (readonly)
281 282 283 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 281 def text @text end |
Class Method Details
.agent_type ⇒ Symbol
Returns the agent type for speakers
46 47 48 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 46 def agent_type :audio end |
.call(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult
Factory method to instantiate and execute speaker
175 176 177 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 175 def call(text:, **, &block) new(text: text, **).call(&block) end |
.lexicon { ... } ⇒ Lexicon
Configures pronunciation lexicon
157 158 159 160 161 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 157 def lexicon(&block) @lexicon ||= Lexicon.new @lexicon.instance_eval(&block) if block_given? @lexicon end |
.lexicon_config ⇒ Object
163 164 165 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 163 def lexicon_config @lexicon || inherited_or_default(:lexicon_config, nil) end |
.model(value = nil) ⇒ String
Sets or returns the TTS model
71 72 73 74 75 76 77 78 79 80 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 71 def model(value = nil) @model = value if value return @model if defined?(@model) && @model if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio superclass.model else default_tts_model end end |
.output_format(value = nil) ⇒ Symbol
Sets or returns the output format
113 114 115 116 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 113 def output_format(value = nil) @output_format = value if value @output_format || inherited_or_default(:output_format, :mp3) end |
.provider(value = nil) ⇒ Symbol
Sets or returns the TTS provider
56 57 58 59 60 61 62 63 64 65 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 56 def provider(value = nil) @provider = value if value return @provider if defined?(@provider) && @provider if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio superclass.provider else default_tts_provider end end |
.speed(value = nil) ⇒ Float
Sets or returns the speech speed
104 105 106 107 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 104 def speed(value = nil) @speed = value if value @speed || inherited_or_default(:speed, 1.0) end |
.stream(text:, **options) {|audio_chunk| ... } ⇒ SpeechResult
Streams the speaker output
185 186 187 188 189 190 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 185 def stream(text:, **, &block) raise ArgumentError, "A block is required for streaming" unless block_given? instance = new(text: text, **.merge(streaming: true)) instance.call(&block) end |
.streaming(value = nil) ⇒ Boolean
Sets or returns streaming mode
122 123 124 125 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 122 def streaming(value = nil) @streaming = value unless value.nil? instance_variable_defined?(:@streaming) ? @streaming : inherited_or_default(:streaming, false) end |
.streaming? ⇒ Boolean
127 128 129 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 127 def streaming? streaming end |
.voice(value = nil) ⇒ String
Sets or returns the voice name
86 87 88 89 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 86 def voice(value = nil) @voice = value if value @voice || inherited_or_default(:voice, default_tts_voice) end |
.voice_id(value = nil) ⇒ String?
Sets or returns the voice ID (for custom/cloned voices)
95 96 97 98 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 95 def voice_id(value = nil) @voice_id = value if value @voice_id || inherited_or_default(:voice_id, nil) end |
.voice_settings { ... } ⇒ VoiceSettings
Configures voice settings (ElevenLabs specific)
139 140 141 142 143 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 139 def voice_settings(&block) @voice_settings ||= VoiceSettings.new @voice_settings.instance_eval(&block) if block_given? @voice_settings end |
.voice_settings_config ⇒ Object
145 146 147 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 145 def voice_settings_config @voice_settings || inherited_or_default(:voice_settings_config, nil) end |
Instance Method Details
#agent_cache_key ⇒ String
Generates the cache key for this speech
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 363 def agent_cache_key components = [ "ruby_llm_agents", "speech", self.class.name, resolved_provider, resolved_model, resolved_voice, resolved_voice_id, resolved_speed, resolved_output_format, Digest::SHA256.hexdigest(text) ].compact components.join("/") end |
#call {|audio_chunk| ... } ⇒ SpeechResult
Executes the speech through the middleware pipeline
302 303 304 305 306 307 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 302 def call(&block) @streaming_block = block context = build_context result_context = Pipeline::Executor.execute(context) result_context.output end |
#execute(context) ⇒ void
This method returns an undefined value.
Core speech execution
This is called by the Pipeline::Executor after middleware has been applied. Only contains the speech API logic.
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 323 def execute(context) execution_started_at = Time.current validate_text_input! processed_text = apply_lexicon(text) # Execute speech synthesis result = execute_speech(processed_text) execution_completed_at = Time.current duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i # Update context context.input_tokens = 0 context.output_tokens = 0 context.total_cost = calculate_cost(result) # Store audio-specific metadata for execution tracking context[:provider] = result[:provider].to_s context[:voice_id] = (resolved_voice_id || resolved_voice).to_s context[:characters] = result[:characters] context[:output_format] = result[:format].to_s context[:file_size] = result[:audio]&.bytesize context[:audio_duration_seconds] = result[:duration] if result[:duration] # Build final result context.output = build_result( result, text, started_at: context.started_at || execution_started_at, completed_at: execution_completed_at, duration_ms: duration_ms, tenant_id: context.tenant_id, execution_id: context.execution_id ) end |
#user_prompt ⇒ String
The input for this speech operation
312 313 314 |
# File 'lib/ruby_llm/agents/audio/speaker.rb', line 312 def user_prompt text end |