Class: RubyLLM::Agents::Transcriber
- Defined in:
- lib/ruby_llm/agents/audio/transcriber.rb
Overview
Base class for creating audio transcribers using the middleware pipeline
Transcriber provides a DSL for configuring audio-to-text operations with built-in execution tracking, budget controls, and multi-tenancy support through the middleware pipeline.
Defined Under Namespace
Classes: ChunkingConfig, ReliabilityConfig
Constant Summary
Constants included from DSL::Base
DSL::Base::PLACEHOLDER_PATTERN
Constants included from DSL::Caching
DSL::Caching::DEFAULT_CACHE_TTL
Constants included from CacheHelper
Instance Attribute Summary collapse
-
#audio ⇒ String, ...
readonly
Audio input.
- #audio_format ⇒ Object readonly
Attributes inherited from BaseAgent
#client, #model, #temperature, #tracked_tool_calls
Transcriber-specific DSL collapse
-
.include_timestamps(value = nil) ⇒ Symbol
Sets or returns whether to include timestamps.
-
.language(value = nil) ⇒ String?
Sets or returns the language for transcription.
-
.model(value = nil) ⇒ String
Sets or returns the transcription model.
-
.output_format(value = nil) ⇒ Symbol
Sets or returns the output format for transcription.
Chunking DSL collapse
-
.chunking { ... } ⇒ ChunkingConfig
Configures chunking for long audio files.
-
.chunking_config ⇒ ChunkingConfig?
Returns chunking configuration.
Reliability DSL collapse
-
.fallback_models(*models) ⇒ Array<String>
Sets fallback models directly (shorthand for reliability block).
-
.reliability { ... } ⇒ ReliabilityConfig
Configures reliability options (retries, fallbacks).
-
.reliability_config ⇒ ReliabilityConfig?
Returns reliability configuration.
Class Method Summary collapse
-
.agent_type ⇒ Symbol
Returns the agent type for transcribers.
-
.call(audio:, format: nil, **options) ⇒ TranscriptionResult
Factory method to instantiate and execute transcription.
Instance Method Summary collapse
-
#agent_cache_key ⇒ String
Generates the cache key for this transcription.
-
#call ⇒ TranscriptionResult
Executes the transcription through the middleware pipeline.
-
#execute(context) ⇒ void
Core transcription execution.
-
#initialize(audio:, format: nil, **options) ⇒ Transcriber
constructor
Creates a new Transcriber instance.
-
#postprocess_text(text) ⇒ String
Post-processes text after transcription.
-
#prompt ⇒ String?
Returns the prompt for transcription context.
-
#user_prompt ⇒ String
The input for this transcription operation.
Methods inherited from BaseAgent
agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, stream, streaming, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware
Methods included from DSL::Base
#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #returns, #schema, #system, #system_config, #timeout, #user, #user_config
Methods included from DSL::Reliability
#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout
Methods included from DSL::Caching
#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config
Methods included from DSL::Queryable
#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params
Methods included from DSL::Knowledge
#knowledge_entries, #knowledge_path, #knows
Methods included from CacheHelper
#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write
Methods included from DSL::Knowledge::InstanceMethods
Constructor Details
#initialize(audio:, format: nil, **options) ⇒ Transcriber
Creates a new Transcriber instance
245 246 247 248 249 250 251 252 253 254 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 245 def initialize(audio:, format: nil, **) @audio = audio @audio_format = format @runtime_language = .delete(:language) # Set model to transcription model if not specified [:model] ||= self.class.model super(**) end |
Instance Attribute Details
#audio ⇒ String, ... (readonly)
Returns Audio input.
238 239 240 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 238 def audio @audio end |
#audio_format ⇒ Object (readonly)
238 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 238 attr_reader :audio, :audio_format |
Class Method Details
.agent_type ⇒ Symbol
Returns the agent type for transcribers
49 50 51 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 49 def agent_type :audio end |
.call(audio:, format: nil, **options) ⇒ TranscriptionResult
Factory method to instantiate and execute transcription
158 159 160 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 158 def call(audio:, format: nil, **) new(audio: audio, format: format, **).call end |
.chunking { ... } ⇒ ChunkingConfig
Configures chunking for long audio files
105 106 107 108 109 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 105 def chunking(&block) @chunking_config ||= ChunkingConfig.new @chunking_config.instance_eval(&block) if block_given? @chunking_config end |
.chunking_config ⇒ ChunkingConfig?
Returns chunking configuration
114 115 116 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 114 def chunking_config @chunking_config || inherited_or_default(:chunking_config, nil) end |
.fallback_models(*models) ⇒ Array<String>
Sets fallback models directly (shorthand for reliability block)
143 144 145 146 147 148 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 143 def fallback_models(*models) if models.any? @fallback_models = models.flatten end @fallback_models || inherited_or_default(:fallback_models, []) end |
.include_timestamps(value = nil) ⇒ Symbol
Sets or returns whether to include timestamps
92 93 94 95 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 92 def (value = nil) @include_timestamps = value if value @include_timestamps || inherited_or_default(:include_timestamps, :segment) end |
.language(value = nil) ⇒ String?
Sets or returns the language for transcription
74 75 76 77 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 74 def language(value = nil) @language = value if value @language || inherited_or_default(:language, nil) end |
.model(value = nil) ⇒ String
Sets or returns the transcription model
59 60 61 62 63 64 65 66 67 68 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 59 def model(value = nil) @model = value if value return @model if defined?(@model) && @model if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio superclass.model else default_transcription_model end end |
.output_format(value = nil) ⇒ Symbol
Sets or returns the output format for transcription
83 84 85 86 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 83 def output_format(value = nil) @output_format = value if value @output_format || inherited_or_default(:output_format, :text) end |
.reliability { ... } ⇒ ReliabilityConfig
Configures reliability options (retries, fallbacks)
126 127 128 129 130 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 126 def reliability(&block) @reliability_config ||= ReliabilityConfig.new @reliability_config.instance_eval(&block) if block_given? @reliability_config end |
.reliability_config ⇒ ReliabilityConfig?
Returns reliability configuration
135 136 137 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 135 def reliability_config @reliability_config || inherited_or_default(:reliability_config, nil) end |
Instance Method Details
#agent_cache_key ⇒ String
Generates the cache key for this transcription
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 352 def agent_cache_key # Generate content hash based on input type content_hash = case @audio when String if @audio.start_with?("http://", "https://") Digest::SHA256.hexdigest(@audio) elsif File.exist?(@audio) Digest::SHA256.file(@audio).hexdigest else Digest::SHA256.hexdigest(@audio) end when File, IO @audio.rewind if @audio.respond_to?(:rewind) Digest::SHA256.hexdigest(@audio.read).tap do @audio.rewind if @audio.respond_to?(:rewind) end else Digest::SHA256.hexdigest(@audio.to_s) end components = [ "ruby_llm_agents", "transcription", self.class.name, resolved_model, resolved_language, self.class.output_format, content_hash ].compact components.join("/") end |
#call ⇒ TranscriptionResult
Executes the transcription through the middleware pipeline
259 260 261 262 263 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 259 def call context = build_context result_context = Pipeline::Executor.execute(context) result_context.output end |
#execute(context) ⇒ void
This method returns an undefined value.
Core transcription execution
This is called by the Pipeline::Executor after middleware has been applied. Only contains the transcription API logic.
304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 304 def execute(context) execution_started_at = Time.current # Normalize and validate input audio_input = normalize_audio_input(@audio, @audio_format) validate_audio_input!(audio_input) # Execute transcription with reliability (retries, fallbacks) raw_result = execute_with_reliability(audio_input) execution_completed_at = Time.current duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i # Update context context.input_tokens = 0 # Audio uses duration, not tokens context.output_tokens = 0 context.total_cost = calculate_cost(raw_result) # Store pricing warning if cost calculation returned nil if @pricing_warning context[:pricing_warning] = @pricing_warning Rails.logger.warn(@pricing_warning) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger end # Store transcription-specific metadata for execution tracking context[:language] = resolved_language if resolved_language context[:detected_language] = raw_result[:language] if raw_result[:language] context[:audio_duration_seconds] = raw_result[:duration] if raw_result[:duration] context[:audio_minutes] = (raw_result[:duration] / 60.0).round(4) if raw_result[:duration] context[:output_format] = self.class.output_format.to_s context[:timestamp_granularity] = self.class..to_s context[:segment_count] = raw_result[:segments]&.size if raw_result[:segments] context[:word_count] = raw_result[:text]&.split(/\s+/)&.size if raw_result[:text] # Build final result context.output = build_result( raw_result, started_at: context.started_at || execution_started_at, completed_at: execution_completed_at, duration_ms: duration_ms, tenant_id: context.tenant_id, execution_id: context.execution_id ) end |
#postprocess_text(text) ⇒ String
Post-processes text after transcription
Override this in subclasses to apply custom post-processing.
293 294 295 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 293 def postprocess_text(text) text end |
#prompt ⇒ String?
Returns the prompt for transcription context
Override this in subclasses to provide context hints that improve transcription accuracy.
283 284 285 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 283 def prompt nil end |
#user_prompt ⇒ String
The input for this transcription operation
268 269 270 271 272 273 274 275 |
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 268 def user_prompt case @audio when String @audio.start_with?("http") ? "Audio URL: #{@audio}" : "Audio file: #{@audio}" else "Audio data" end end |