Class: RubyLLM::Agents::Transcriber

Inherits:
BaseAgent
  • Object
show all
Defined in:
lib/ruby_llm/agents/audio/transcriber.rb

Overview

Base class for creating audio transcribers using the middleware pipeline

Transcriber provides a DSL for configuring audio-to-text operations with built-in execution tracking, budget controls, and multi-tenancy support through the middleware pipeline.

Examples:

Basic usage

class MeetingTranscriber < RubyLLM::Agents::Transcriber
  model 'whisper-1'
end

result = MeetingTranscriber.call(audio: "meeting.mp3")
result.text  # => "Hello everyone, welcome to the meeting..."

With language specification

class SpanishTranscriber < RubyLLM::Agents::Transcriber
  model 'gpt-4o-transcribe'
  language 'es'

  def prompt
    "Podcast sobre tecnología y programación"
  end
end

With subtitle output

class SubtitleGenerator < RubyLLM::Agents::Transcriber
  model 'whisper-1'
  output_format :srt
  include_timestamps :segment
end

result = SubtitleGenerator.call(audio: "video.mp4")
result.srt  # => "1\n00:00:00,000 --> 00:00:02,500\nHello\n\n..."

Defined Under Namespace

Classes: ChunkingConfig, ReliabilityConfig

Constant Summary

Constants included from DSL::Base

DSL::Base::PLACEHOLDER_PATTERN

Constants included from DSL::Caching

DSL::Caching::DEFAULT_CACHE_TTL

Constants included from CacheHelper

CacheHelper::NAMESPACE

Instance Attribute Summary collapse

Attributes inherited from BaseAgent

#client, #model, #temperature, #tracked_tool_calls

Transcriber-specific DSL collapse

Chunking DSL collapse

Reliability DSL collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from BaseAgent

agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, stream, streaming, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware

Methods included from DSL::Base

#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #returns, #schema, #system, #system_config, #timeout, #user, #user_config

Methods included from DSL::Reliability

#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout

Methods included from DSL::Caching

#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config

Methods included from DSL::Queryable

#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params

Methods included from DSL::Knowledge

#knowledge_entries, #knowledge_path, #knows

Methods included from CacheHelper

#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write

Methods included from DSL::Knowledge::InstanceMethods

#compiled_knowledge

Constructor Details

#initialize(audio:, format: nil, **options) ⇒ Transcriber

Creates a new Transcriber instance

Parameters:

  • audio (String, File, IO)

    Audio file path, URL, File object, or binary data

  • format (Symbol, nil) (defaults to: nil)

    Audio format hint when passing binary data

  • options (Hash)

    Configuration options



245
246
247
248
249
250
251
252
253
254
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 245

def initialize(audio:, format: nil, **options)
  @audio = audio
  @audio_format = format
  @runtime_language = options.delete(:language)

  # Set model to transcription model if not specified
  options[:model] ||= self.class.model

  super(**options)
end

Instance Attribute Details

#audioString, ... (readonly)

Returns Audio input.

Returns:

  • (String, File, IO)

    Audio input



238
239
240
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 238

def audio
  @audio
end

#audio_formatObject (readonly)



238
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 238

attr_reader :audio, :audio_format

Class Method Details

.agent_typeSymbol

Returns the agent type for transcribers

Returns:

  • (Symbol)

    :audio



49
50
51
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 49

def agent_type
  :audio
end

.call(audio:, format: nil, **options) ⇒ TranscriptionResult

Factory method to instantiate and execute transcription

Parameters:

  • audio (String, File, IO)

    Audio file path, URL, File object, or binary data

  • format (Symbol, nil) (defaults to: nil)

    Audio format hint when passing binary data

  • options (Hash)

    Additional options

Returns:



158
159
160
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 158

def call(audio:, format: nil, **options)
  new(audio: audio, format: format, **options).call
end

.chunking { ... } ⇒ ChunkingConfig

Configures chunking for long audio files

Yields:

  • Block for configuring chunking options

Returns:



105
106
107
108
109
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 105

def chunking(&block)
  @chunking_config ||= ChunkingConfig.new
  @chunking_config.instance_eval(&block) if block_given?
  @chunking_config
end

.chunking_configChunkingConfig?

Returns chunking configuration

Returns:



114
115
116
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 114

def chunking_config
  @chunking_config || inherited_or_default(:chunking_config, nil)
end

.fallback_models(*models) ⇒ Array<String>

Sets fallback models directly (shorthand for reliability block)

Parameters:

  • models (Array<String>)

    Model identifiers to try on failure

Returns:

  • (Array<String>)

    The fallback models



143
144
145
146
147
148
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 143

def fallback_models(*models)
  if models.any?
    @fallback_models = models.flatten
  end
  @fallback_models || inherited_or_default(:fallback_models, [])
end

.include_timestamps(value = nil) ⇒ Symbol

Sets or returns whether to include timestamps

Parameters:

  • value (Symbol, nil) (defaults to: nil)

    Timestamp level (:none, :segment, :word)

Returns:

  • (Symbol)

    The current timestamp setting



92
93
94
95
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 92

def include_timestamps(value = nil)
  @include_timestamps = value if value
  @include_timestamps || inherited_or_default(:include_timestamps, :segment)
end

.language(value = nil) ⇒ String?

Sets or returns the language for transcription

Parameters:

  • value (String, nil) (defaults to: nil)

    ISO 639-1 language code

Returns:

  • (String, nil)

    The current language setting



74
75
76
77
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 74

def language(value = nil)
  @language = value if value
  @language || inherited_or_default(:language, nil)
end

.model(value = nil) ⇒ String

Sets or returns the transcription model

Parameters:

  • value (String, nil) (defaults to: nil)

    The model identifier

Returns:

  • (String)

    The current model setting



59
60
61
62
63
64
65
66
67
68
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 59

def model(value = nil)
  @model = value if value
  return @model if defined?(@model) && @model

  if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio
    superclass.model
  else
    default_transcription_model
  end
end

.output_format(value = nil) ⇒ Symbol

Sets or returns the output format for transcription

Parameters:

  • value (Symbol, nil) (defaults to: nil)

    Output format (:text, :json, :srt, :vtt, :verbose_json)

Returns:

  • (Symbol)

    The current output format



83
84
85
86
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 83

def output_format(value = nil)
  @output_format = value if value
  @output_format || inherited_or_default(:output_format, :text)
end

.reliability { ... } ⇒ ReliabilityConfig

Configures reliability options (retries, fallbacks)

Yields:

  • Block for configuring reliability options

Returns:



126
127
128
129
130
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 126

def reliability(&block)
  @reliability_config ||= ReliabilityConfig.new
  @reliability_config.instance_eval(&block) if block_given?
  @reliability_config
end

.reliability_configReliabilityConfig?

Returns reliability configuration

Returns:



135
136
137
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 135

def reliability_config
  @reliability_config || inherited_or_default(:reliability_config, nil)
end

Instance Method Details

#agent_cache_keyString

Generates the cache key for this transcription

Returns:

  • (String)

    Cache key



352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 352

def agent_cache_key
  # Generate content hash based on input type
  content_hash = case @audio
  when String
    if @audio.start_with?("http://", "https://")
      Digest::SHA256.hexdigest(@audio)
    elsif File.exist?(@audio)
      Digest::SHA256.file(@audio).hexdigest
    else
      Digest::SHA256.hexdigest(@audio)
    end
  when File, IO
    @audio.rewind if @audio.respond_to?(:rewind)
    Digest::SHA256.hexdigest(@audio.read).tap do
      @audio.rewind if @audio.respond_to?(:rewind)
    end
  else
    Digest::SHA256.hexdigest(@audio.to_s)
  end

  components = [
    "ruby_llm_agents",
    "transcription",
    self.class.name,
    resolved_model,
    resolved_language,
    self.class.output_format,
    content_hash
  ].compact

  components.join("/")
end

#callTranscriptionResult

Executes the transcription through the middleware pipeline

Returns:



259
260
261
262
263
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 259

def call
  context = build_context
  result_context = Pipeline::Executor.execute(context)
  result_context.output
end

#execute(context) ⇒ void

This method returns an undefined value.

Core transcription execution

This is called by the Pipeline::Executor after middleware has been applied. Only contains the transcription API logic.

Parameters:



304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 304

def execute(context)
  execution_started_at = Time.current

  # Normalize and validate input
  audio_input = normalize_audio_input(@audio, @audio_format)
  validate_audio_input!(audio_input)

  # Execute transcription with reliability (retries, fallbacks)
  raw_result = execute_with_reliability(audio_input)

  execution_completed_at = Time.current
  duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i

  # Update context
  context.input_tokens = 0 # Audio uses duration, not tokens
  context.output_tokens = 0
  context.total_cost = calculate_cost(raw_result)

  # Store pricing warning if cost calculation returned nil
  if @pricing_warning
    context[:pricing_warning] = @pricing_warning
    Rails.logger.warn(@pricing_warning) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
  end

  # Store transcription-specific metadata for execution tracking
  context[:language] = resolved_language if resolved_language
  context[:detected_language] = raw_result[:language] if raw_result[:language]
  context[:audio_duration_seconds] = raw_result[:duration] if raw_result[:duration]
  context[:audio_minutes] = (raw_result[:duration] / 60.0).round(4) if raw_result[:duration]
  context[:output_format] = self.class.output_format.to_s
  context[:timestamp_granularity] = self.class.include_timestamps.to_s
  context[:segment_count] = raw_result[:segments]&.size if raw_result[:segments]
  context[:word_count] = raw_result[:text]&.split(/\s+/)&.size if raw_result[:text]

  # Build final result
  context.output = build_result(
    raw_result,
    started_at: context.started_at || execution_started_at,
    completed_at: execution_completed_at,
    duration_ms: duration_ms,
    tenant_id: context.tenant_id,
    execution_id: context.execution_id
  )
end

#postprocess_text(text) ⇒ String

Post-processes text after transcription

Override this in subclasses to apply custom post-processing.

Parameters:

  • text (String)

    The transcribed text

Returns:

  • (String)

    The processed text



293
294
295
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 293

def postprocess_text(text)
  text
end

#promptString?

Returns the prompt for transcription context

Override this in subclasses to provide context hints that improve transcription accuracy.

Returns:

  • (String, nil)

    The context prompt



283
284
285
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 283

def prompt
  nil
end

#user_promptString

The input for this transcription operation

Returns:

  • (String)

    Description of the audio input



268
269
270
271
272
273
274
275
# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 268

def user_prompt
  case @audio
  when String
    @audio.start_with?("http") ? "Audio URL: #{@audio}" : "Audio file: #{@audio}"
  else
    "Audio data"
  end
end