Class: RubyLLM::Agents::Transcriber

Inherits:

BaseAgent

Object
BaseAgent
RubyLLM::Agents::Transcriber

show all

Defined in:: lib/ruby_llm/agents/audio/transcriber.rb

Overview

Base class for creating audio transcribers using the middleware pipeline

Transcriber provides a DSL for configuring audio-to-text operations with built-in execution tracking, budget controls, and multi-tenancy support through the middleware pipeline.

Examples:

Basic usage

class MeetingTranscriber < RubyLLM::Agents::Transcriber
  model 'whisper-1'
end

result = MeetingTranscriber.call(audio: "meeting.mp3")
result.text  # => "Hello everyone, welcome to the meeting..."

With language specification

class SpanishTranscriber < RubyLLM::Agents::Transcriber
  model 'gpt-4o-transcribe'
  language 'es'

  def prompt
    "Podcast sobre tecnología y programación"
  end
end

With subtitle output

class SubtitleGenerator < RubyLLM::Agents::Transcriber
  model 'whisper-1'
  output_format :srt
  include_timestamps :segment
end

result = SubtitleGenerator.call(audio: "video.mp4")
result.srt  # => "1\n00:00:00,000 --> 00:00:02,500\nHello\n\n..."

Defined Under Namespace

Classes: ChunkingConfig, ReliabilityConfig

Constant Summary

Instance Attribute Summary collapse

#audio ⇒ String, ... readonly

Audio input.
#audio_format ⇒ Object readonly

Attributes inherited from BaseAgent

#client, #model, #temperature, #tracked_tool_calls

Transcriber-specific DSL collapse

.include_timestamps(value = nil) ⇒ Symbol

Sets or returns whether to include timestamps.
.language(value = nil) ⇒ String^?

Sets or returns the language for transcription.
.model(value = nil) ⇒ String

Sets or returns the transcription model.
.output_format(value = nil) ⇒ Symbol

Sets or returns the output format for transcription.

Chunking DSL collapse

.chunking { ... } ⇒ ChunkingConfig

Configures chunking for long audio files.
.chunking_config ⇒ ChunkingConfig^?

Returns chunking configuration.

Reliability DSL collapse

.fallback_models(*models) ⇒ Array<String>

Sets fallback models directly (shorthand for reliability block).
.reliability { ... } ⇒ ReliabilityConfig

Configures reliability options (retries, fallbacks).
.reliability_config ⇒ ReliabilityConfig^?

Returns reliability configuration.

Class Method Summary collapse

.agent_type ⇒ Symbol

Returns the agent type for transcribers.
.call(audio:, format: nil, **options) ⇒ TranscriptionResult

Factory method to instantiate and execute transcription.

Instance Method Summary collapse

#agent_cache_key ⇒ String

Generates the cache key for this transcription.
#call ⇒ TranscriptionResult

Executes the transcription through the middleware pipeline.
#execute(context) ⇒ void

Core transcription execution.
#initialize(audio:, format: nil, **options) ⇒ Transcriber constructor

Creates a new Transcriber instance.
#postprocess_text(text) ⇒ String

Post-processes text after transcription.
#prompt ⇒ String^?

Returns the prompt for transcription context.
#user_prompt ⇒ String

The input for this transcription operation.

Methods inherited from BaseAgent

agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, stream, streaming, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware

Methods included from DSL::Base

#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #returns, #schema, #system, #system_config, #timeout, #user, #user_config

Methods included from DSL::Reliability

#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout

Methods included from DSL::Caching

#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config

Methods included from DSL::Queryable

#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params

Methods included from DSL::Knowledge

#knowledge_entries, #knowledge_path, #knows

Methods included from CacheHelper

#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write

Methods included from DSL::Knowledge::InstanceMethods

#compiled_knowledge

Constructor Details

#initialize(audio:, format: nil, **options) ⇒ `Transcriber`

Creates a new Transcriber instance

Parameters:

audio (String, File, IO) —

Audio file path, URL, File object, or binary data
format (Symbol, nil) (defaults to: nil) —

Audio format hint when passing binary data
options (Hash) —

Configuration options

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 245

def initialize(audio:, format: nil, **options)
  @audio = audio
  @audio_format = format
  @runtime_language = options.delete(:language)

  # Set model to transcription model if not specified
  options[:model] ||= self.class.model

  super(**options)
end

Instance Attribute Details

#audio ⇒ `String`, ... (readonly)

Returns Audio input.

Returns:

(String, File, IO) —

Audio input



238
239
240

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 238

def audio
  @audio
end

#audio_format ⇒ `Object` (readonly)

238	# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 238 attr_reader :audio, :audio_format

Class Method Details

.agent_type ⇒ `Symbol`

Returns the agent type for transcribers

Returns:

(Symbol) —

:audio



49
50
51

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 49

def agent_type
  :audio
end

.call(audio:, format: nil, **options) ⇒ `TranscriptionResult`

Factory method to instantiate and execute transcription

Parameters:

audio (String, File, IO) —

Audio file path, URL, File object, or binary data
format (Symbol, nil) (defaults to: nil) —

Audio format hint when passing binary data
options (Hash) —

Additional options

Returns:

(TranscriptionResult) —

The transcription result



158
159
160

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 158

def call(audio:, format: nil, **options)
  new(audio: audio, format: format, **options).call
end

.chunking { ... } ⇒ `ChunkingConfig`

Configures chunking for long audio files

Yields:

Block for configuring chunking options

Returns:

(ChunkingConfig) —

The chunking configuration

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 105

def chunking(&block)
  @chunking_config ||= ChunkingConfig.new
  @chunking_config.instance_eval(&block) if block_given?
  @chunking_config
end

.chunking_config ⇒ `ChunkingConfig`^?

Returns chunking configuration

Returns:

(ChunkingConfig, nil) —

The chunking configuration



114
115
116

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 114

def chunking_config
  @chunking_config || inherited_or_default(:chunking_config, nil)
end

.fallback_models(*models) ⇒ `Array<String>`

Sets fallback models directly (shorthand for reliability block)

Parameters:

models (Array<String>) —

Model identifiers to try on failure

Returns:

(Array<String>) —

The fallback models

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 143

def fallback_models(*models)
  if models.any?
    @fallback_models = models.flatten
  end
  @fallback_models || inherited_or_default(:fallback_models, [])
end

.include_timestamps(value = nil) ⇒ `Symbol`

Sets or returns whether to include timestamps

Parameters:

value (Symbol, nil) (defaults to: nil) —

Timestamp level (:none, :segment, :word)

Returns:

(Symbol) —

The current timestamp setting

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 92

def include_timestamps(value = nil)
  @include_timestamps = value if value
  @include_timestamps || inherited_or_default(:include_timestamps, :segment)
end

.language(value = nil) ⇒ `String`^?

Sets or returns the language for transcription

Parameters:

value (String, nil) (defaults to: nil) —

ISO 639-1 language code

Returns:

(String, nil) —

The current language setting

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 74

def language(value = nil)
  @language = value if value
  @language || inherited_or_default(:language, nil)
end

.model(value = nil) ⇒ `String`

Sets or returns the transcription model

Parameters:

value (String, nil) (defaults to: nil) —

The model identifier

Returns:

(String) —

The current model setting

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 59

def model(value = nil)
  @model = value if value
  return @model if defined?(@model) && @model

  if superclass.respond_to?(:agent_type) && superclass.agent_type == :audio
    superclass.model
  else
    default_transcription_model
  end
end

.output_format(value = nil) ⇒ `Symbol`

Sets or returns the output format for transcription

Parameters:

value (Symbol, nil) (defaults to: nil) —

Output format (:text, :json, :srt, :vtt, :verbose_json)

Returns:

(Symbol) —

The current output format

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 83

def output_format(value = nil)
  @output_format = value if value
  @output_format || inherited_or_default(:output_format, :text)
end

.reliability { ... } ⇒ `ReliabilityConfig`

Configures reliability options (retries, fallbacks)

Yields:

Block for configuring reliability options

Returns:

(ReliabilityConfig) —

The reliability configuration

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 126

def reliability(&block)
  @reliability_config ||= ReliabilityConfig.new
  @reliability_config.instance_eval(&block) if block_given?
  @reliability_config
end

.reliability_config ⇒ `ReliabilityConfig`^?

Returns reliability configuration

Returns:

(ReliabilityConfig, nil) —

The reliability configuration



135
136
137

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 135

def reliability_config
  @reliability_config || inherited_or_default(:reliability_config, nil)
end

Instance Method Details

#agent_cache_key ⇒ `String`

Generates the cache key for this transcription

Returns:

(String) —

Cache key

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 352

def agent_cache_key
  # Generate content hash based on input type
  content_hash = case @audio
  when String
    if @audio.start_with?("http://", "https://")
      Digest::SHA256.hexdigest(@audio)
    elsif File.exist?(@audio)
      Digest::SHA256.file(@audio).hexdigest
    else
      Digest::SHA256.hexdigest(@audio)
    end
  when File, IO
    @audio.rewind if @audio.respond_to?(:rewind)
    Digest::SHA256.hexdigest(@audio.read).tap do
      @audio.rewind if @audio.respond_to?(:rewind)
    end
  else
    Digest::SHA256.hexdigest(@audio.to_s)
  end

  components = [
    "ruby_llm_agents",
    "transcription",
    self.class.name,
    resolved_model,
    resolved_language,
    self.class.output_format,
    content_hash
  ].compact

  components.join("/")
end

#call ⇒ `TranscriptionResult`

Executes the transcription through the middleware pipeline

Returns:

(TranscriptionResult) —

The transcription result

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 259

def call
  context = build_context
  result_context = Pipeline::Executor.execute(context)
  result_context.output
end

#execute(context) ⇒ `void`

This method returns an undefined value.

Core transcription execution

This is called by the Pipeline::Executor after middleware has been applied. Only contains the transcription API logic.

Parameters:

context (Pipeline::Context) —

The execution context

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 304

def execute(context)
  execution_started_at = Time.current

  # Normalize and validate input
  audio_input = normalize_audio_input(@audio, @audio_format)
  validate_audio_input!(audio_input)

  # Execute transcription with reliability (retries, fallbacks)
  raw_result = execute_with_reliability(audio_input)

  execution_completed_at = Time.current
  duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i

  # Update context
  context.input_tokens = 0 # Audio uses duration, not tokens
  context.output_tokens = 0
  context.total_cost = calculate_cost(raw_result)

  # Store pricing warning if cost calculation returned nil
  if @pricing_warning
    context[:pricing_warning] = @pricing_warning
    Rails.logger.warn(@pricing_warning) if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
  end

  # Store transcription-specific metadata for execution tracking
  context[:language] = resolved_language if resolved_language
  context[:detected_language] = raw_result[:language] if raw_result[:language]
  context[:audio_duration_seconds] = raw_result[:duration] if raw_result[:duration]
  context[:audio_minutes] = (raw_result[:duration] / 60.0).round(4) if raw_result[:duration]
  context[:output_format] = self.class.output_format.to_s
  context[:timestamp_granularity] = self.class.include_timestamps.to_s
  context[:segment_count] = raw_result[:segments]&.size if raw_result[:segments]
  context[:word_count] = raw_result[:text]&.split(/\s+/)&.size if raw_result[:text]

  # Build final result
  context.output = build_result(
    raw_result,
    started_at: context.started_at || execution_started_at,
    completed_at: execution_completed_at,
    duration_ms: duration_ms,
    tenant_id: context.tenant_id,
    execution_id: context.execution_id
  )
end

#postprocess_text(text) ⇒ `String`

Post-processes text after transcription

Override this in subclasses to apply custom post-processing.

Parameters:

text (String) —

The transcribed text

Returns:

(String) —

The processed text



293
294
295

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 293

def postprocess_text(text)
  text
end

#prompt ⇒ `String`^?

Returns the prompt for transcription context

Override this in subclasses to provide context hints that improve transcription accuracy.

Returns:

(String, nil) —

The context prompt



283
284
285

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 283

def prompt
  nil
end

#user_prompt ⇒ `String`

The input for this transcription operation

Returns:

(String) —

Description of the audio input

# File 'lib/ruby_llm/agents/audio/transcriber.rb', line 268

def user_prompt
  case @audio
  when String
    @audio.start_with?("http") ? "Audio URL: #{@audio}" : "Audio file: #{@audio}"
  else
    "Audio data"
  end
end

Class: RubyLLM::Agents::Transcriber

Overview

Examples:

Basic usage

With language specification

With subtitle output

Defined Under Namespace

Constant Summary

Constants included from DSL::Base

Constants included from DSL::Caching

Constants included from CacheHelper

Instance Attribute Summary collapse

Attributes inherited from BaseAgent

Transcriber-specific DSL collapse

Chunking DSL collapse

Reliability DSL collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from BaseAgent

Methods included from DSL::Base

Methods included from DSL::Reliability

Methods included from DSL::Caching

Methods included from DSL::Queryable

Methods included from DSL::Knowledge

Methods included from CacheHelper

Methods included from DSL::Knowledge::InstanceMethods

Constructor Details

#initialize(audio:, format: nil, **options) ⇒ Transcriber

Instance Attribute Details

#audio ⇒ String, ... (readonly)

#audio_format ⇒ Object (readonly)

Class Method Details

.agent_type ⇒ Symbol

.call(audio:, format: nil, **options) ⇒ TranscriptionResult

.chunking { ... } ⇒ ChunkingConfig

.chunking_config ⇒ ChunkingConfig?

.fallback_models(*models) ⇒ Array<String>

.include_timestamps(value = nil) ⇒ Symbol

.language(value = nil) ⇒ String?

.model(value = nil) ⇒ String

.output_format(value = nil) ⇒ Symbol

.reliability { ... } ⇒ ReliabilityConfig

.reliability_config ⇒ ReliabilityConfig?

Instance Method Details

#agent_cache_key ⇒ String

#call ⇒ TranscriptionResult

#execute(context) ⇒ void

#postprocess_text(text) ⇒ String

#prompt ⇒ String?

#user_prompt ⇒ String

#initialize(audio:, format: nil, **options) ⇒ `Transcriber`

#audio ⇒ `String`, ... (readonly)

#audio_format ⇒ `Object` (readonly)

.agent_type ⇒ `Symbol`

.call(audio:, format: nil, **options) ⇒ `TranscriptionResult`

.chunking { ... } ⇒ `ChunkingConfig`

.chunking_config ⇒ `ChunkingConfig`^?

.fallback_models(*models) ⇒ `Array<String>`

.include_timestamps(value = nil) ⇒ `Symbol`

.language(value = nil) ⇒ `String`^?

.model(value = nil) ⇒ `String`

.output_format(value = nil) ⇒ `Symbol`

.reliability { ... } ⇒ `ReliabilityConfig`

.reliability_config ⇒ `ReliabilityConfig`^?

#agent_cache_key ⇒ `String`

#call ⇒ `TranscriptionResult`

#execute(context) ⇒ `void`

#postprocess_text(text) ⇒ `String`

#prompt ⇒ `String`^?

#user_prompt ⇒ `String`