Class: RubyLLM::Agents::Embedder

Inherits:

BaseAgent

Object
BaseAgent
RubyLLM::Agents::Embedder

show all

Defined in:: lib/ruby_llm/agents/text/embedder.rb

Overview

Base class for creating embedding generators

Embedder inherits from BaseAgent and uses the middleware pipeline for caching, reliability, instrumentation, and budget controls. Only the core embedding logic is implemented here.

Examples:

Basic usage

class DocumentEmbedder < RubyLLM::Agents::Embedder
  model 'text-embedding-3-small'
  dimensions 512
end

result = DocumentEmbedder.call(text: "Hello world")
result.vector  # => [0.123, -0.456, ...]

Batch processing

result = DocumentEmbedder.call(texts: ["Hello", "World"])
result.vectors  # => [[...], [...]]

With preprocessing

class CleanEmbedder < RubyLLM::Agents::Embedder
  model 'text-embedding-3-small'

  def preprocess(text)
    text.strip.downcase.gsub(/\s+/, ' ')
  end
end

Constant Summary

Instance Attribute Summary collapse

#text ⇒ String^? readonly

Single text to embed.
#texts ⇒ Object readonly

Attributes inherited from BaseAgent

#client, #model, #temperature, #tracked_tool_calls

Embedding-specific DSL collapse

.batch_size(value = nil) ⇒ Integer

Sets or returns the batch size.
.dimensions(value = nil) ⇒ Integer^?

Sets or returns the vector dimensions.
.model(value = nil) ⇒ String

Sets or returns the embedding model.

Class Method Summary collapse

.agent_type ⇒ Symbol

Returns the agent type for embedders.
.call(text: nil, texts: nil, **options) {|batch_result, index| ... } ⇒ EmbeddingResult

Executes the embedder with the given parameters.

Instance Method Summary collapse

#agent_cache_key ⇒ String

Generates the cache key for this embedding.
#call {|batch_result, index| ... } ⇒ EmbeddingResult

Executes the embedding through the middleware pipeline.
#execute(context) ⇒ void

Core embedding execution.
#initialize(text: nil, texts: nil, **options) ⇒ Embedder constructor

Creates a new Embedder instance.
#preprocess(text) ⇒ String

Preprocesses text before embedding.
#user_prompt ⇒ String⁺

The input for this embedding operation.

Methods inherited from BaseAgent

agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, stream, streaming, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware

Methods included from DSL::Base

#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #prompt, #returns, #schema, #system, #system_config, #timeout, #user, #user_config

Methods included from DSL::Reliability

#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout

Methods included from DSL::Caching

#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config

Methods included from DSL::Queryable

#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params

Methods included from DSL::Knowledge

#knowledge_entries, #knowledge_path, #knows

Methods included from CacheHelper

#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write

Methods included from DSL::Knowledge::InstanceMethods

#compiled_knowledge

Constructor Details

#initialize(text: nil, texts: nil, **options) ⇒ `Embedder`

Creates a new Embedder instance

Parameters:

text (String, nil) (defaults to: nil) —

Single text to embed
texts (Array<String>, nil) (defaults to: nil) —

Multiple texts to embed
options (Hash) —

Additional options

# File 'lib/ruby_llm/agents/text/embedder.rb', line 148

def initialize(text: nil, texts: nil, **options)
  @text = text
  @texts = texts
  @batch_block = nil

  # Set model to embedding model if not specified
  options[:model] ||= self.class.model || self.class.class_eval { default_embedding_model }

  super(**options)
end

Instance Attribute Details

#text ⇒ `String`^? (readonly)

Returns Single text to embed.

Returns:

(String, nil) —

Single text to embed



141
142
143

# File 'lib/ruby_llm/agents/text/embedder.rb', line 141

def text
  @text
end

#texts ⇒ `Object` (readonly)

141	# File 'lib/ruby_llm/agents/text/embedder.rb', line 141 attr_reader :text, :texts

Class Method Details

.agent_type ⇒ `Symbol`

Returns the agent type for embedders

Returns:

(Symbol) —

:embedding



41
42
43

# File 'lib/ruby_llm/agents/text/embedder.rb', line 41

def agent_type
  :embedding
end

.batch_size(value = nil) ⇒ `Integer`

Sets or returns the batch size

When embedding multiple texts, they are split into batches of this size for API calls.

Examples:

batch_size 50

Parameters:

value (Integer, nil) (defaults to: nil) —

Maximum texts per API call

Returns:

(Integer) —

The current batch size

# File 'lib/ruby_llm/agents/text/embedder.rb', line 91

def batch_size(value = nil)
  @batch_size = value if value
  @batch_size || inherited_or_default(:batch_size, default_embedding_batch_size)
end

.call(text: nil, texts: nil, **options) {|batch_result, index| ... } ⇒ `EmbeddingResult`

Executes the embedder with the given parameters

Parameters:

text (String, nil) (defaults to: nil) —

Single text to embed
texts (Array<String>, nil) (defaults to: nil) —

Multiple texts to embed
options (Hash) —

Additional options

Yields:

(batch_result, index) —

Called after each batch completes

Yield Parameters:

batch_result (EmbeddingResult) —

Result for the batch
index (Integer) —

Batch index (0-based)

Returns:

(EmbeddingResult) —

The embedding result

Raises:

(ArgumentError) —

If both text: and texts: are provided



108
109
110

# File 'lib/ruby_llm/agents/text/embedder.rb', line 108

def call(text: nil, texts: nil, **options, &block)
  new(text: text, texts: texts, **options).call(&block)
end

.dimensions(value = nil) ⇒ `Integer`^?

Sets or returns the vector dimensions

Some models (like OpenAI text-embedding-3) support reducing dimensions for more efficient storage.

Examples:

dimensions 512

Parameters:

value (Integer, nil) (defaults to: nil) —

The dimensions to set

Returns:

(Integer, nil) —

The current dimensions setting

# File 'lib/ruby_llm/agents/text/embedder.rb', line 77

def dimensions(value = nil)
  @dimensions = value if value
  @dimensions || inherited_or_default(:dimensions, default_embedding_dimensions)
end

.model(value = nil) ⇒ `String`

Sets or returns the embedding model

Defaults to the embedding model from configuration, not the conversation model that BaseAgent uses.

Examples:

model "text-embedding-3-large"

Parameters:

value (String, nil) (defaults to: nil) —

The model identifier to set

Returns:

(String) —

The current model setting

# File 'lib/ruby_llm/agents/text/embedder.rb', line 56

def model(value = nil)
  @model = value if value
  return @model if defined?(@model) && @model

  # For inheritance: check if parent is also an Embedder
  if superclass.respond_to?(:agent_type) && superclass.agent_type == :embedding
    superclass.model
  else
    default_embedding_model
  end
end

Instance Method Details

#agent_cache_key ⇒ `String`

Generates the cache key for this embedding

Returns:

(String) —

Cache key in format “ruby_llm_agents/embedding/…”

# File 'lib/ruby_llm/agents/text/embedder.rb', line 256

def agent_cache_key
  components = [
    "ruby_llm_agents",
    "embedding",
    self.class.name,
    resolved_model,
    resolved_dimensions,
    Digest::SHA256.hexdigest(input_texts.map { |t| preprocess(t) }.join("\n"))
  ].compact

  components.join("/")
end

#call {|batch_result, index| ... } ⇒ `EmbeddingResult`

Executes the embedding through the middleware pipeline

Yields:

(batch_result, index) —

Called after each batch completes

Returns:

(EmbeddingResult) —

The embedding result

# File 'lib/ruby_llm/agents/text/embedder.rb', line 163

def call(&block)
  @batch_block = block
  context = build_context
  result_context = Pipeline::Executor.execute(context)
  result_context.output
end

#execute(context) ⇒ `void`

This method returns an undefined value.

Core embedding execution

This is called by the Pipeline::Executor after middleware has been applied. Only contains the embedding API logic.

Parameters:

context (Pipeline::Context) —

The execution context

# File 'lib/ruby_llm/agents/text/embedder.rb', line 201

def execute(context)
  # Track timing internally since middleware sets completed_at after execute returns
  execution_started_at = Time.current

  input_list = input_texts
  validate_input!(input_list)

  all_vectors = []
  total_input_tokens = 0
  total_cost = 0.0
  batch_count = resolved_batch_size

  batches = input_list.each_slice(batch_count).to_a

  batches.each_with_index do |batch, index|
    batch_result = execute_batch(batch, context)

    all_vectors.concat(batch_result[:vectors])
    total_input_tokens += batch_result[:input_tokens] || 0
    total_cost += batch_result[:cost] || 0.0

    # Yield batch result for progress tracking
    if @batch_block
      batch_embedding_result = build_batch_result(batch_result, batch.size)
      @batch_block.call(batch_embedding_result, index)
    end
  end

  execution_completed_at = Time.current
  duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i

  # Update context with token/cost info
  context.input_tokens = total_input_tokens
  context.output_tokens = 0
  context.input_cost = total_cost
  context.output_cost = 0.0
  context.total_cost = total_cost.round(6)

  # Build final result
  context.output = build_result(
    vectors: all_vectors,
    input_tokens: total_input_tokens,
    total_cost: total_cost,
    count: input_list.size,
    started_at: context.started_at || execution_started_at,
    completed_at: execution_completed_at,
    duration_ms: duration_ms,
    tenant_id: context.tenant_id,
    execution_id: context.execution_id
  )
end

#preprocess(text) ⇒ `String`

Preprocesses text before embedding

Override this method in subclasses to apply custom preprocessing like normalization, cleaning, or truncation.

Examples:

Custom preprocessing

def preprocess(text)
  text.strip.downcase.gsub(/\s+/, ' ').truncate(8000)
end

Parameters:

text (String) —

The text to preprocess

Returns:

(String) —

The preprocessed text



190
191
192

# File 'lib/ruby_llm/agents/text/embedder.rb', line 190

def preprocess(text)
  text
end

#user_prompt ⇒ `String`⁺

The input for this embedding operation

Used by the pipeline to generate cache keys and for instrumentation.

Returns:

(String, Array<String>) —

The input text(s)



175
176
177

# File 'lib/ruby_llm/agents/text/embedder.rb', line 175

def user_prompt
  input_texts.join("\n---\n")
end

Class: RubyLLM::Agents::Embedder

Overview

Examples:

Basic usage

Batch processing

With preprocessing

Constant Summary

Constants included from DSL::Base

Constants included from DSL::Caching

Constants included from CacheHelper

Instance Attribute Summary collapse

Attributes inherited from BaseAgent

Embedding-specific DSL collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from BaseAgent

Methods included from DSL::Base

Methods included from DSL::Reliability

Methods included from DSL::Caching

Methods included from DSL::Queryable

Methods included from DSL::Knowledge

Methods included from CacheHelper

Methods included from DSL::Knowledge::InstanceMethods

Constructor Details

#initialize(text: nil, texts: nil, **options) ⇒ Embedder

Instance Attribute Details

#text ⇒ String? (readonly)

#texts ⇒ Object (readonly)

Class Method Details

.agent_type ⇒ Symbol

.batch_size(value = nil) ⇒ Integer

Examples:

.call(text: nil, texts: nil, **options) {|batch_result, index| ... } ⇒ EmbeddingResult

.dimensions(value = nil) ⇒ Integer?

Examples:

.model(value = nil) ⇒ String

Examples:

Instance Method Details

#agent_cache_key ⇒ String

#call {|batch_result, index| ... } ⇒ EmbeddingResult

#execute(context) ⇒ void

#preprocess(text) ⇒ String

Examples:

Custom preprocessing

#user_prompt ⇒ String+

#initialize(text: nil, texts: nil, **options) ⇒ `Embedder`

#text ⇒ `String`^? (readonly)

#texts ⇒ `Object` (readonly)

.agent_type ⇒ `Symbol`

.batch_size(value = nil) ⇒ `Integer`

.call(text: nil, texts: nil, **options) {|batch_result, index| ... } ⇒ `EmbeddingResult`

.dimensions(value = nil) ⇒ `Integer`^?

.model(value = nil) ⇒ `String`

#agent_cache_key ⇒ `String`

#call {|batch_result, index| ... } ⇒ `EmbeddingResult`

#execute(context) ⇒ `void`

#preprocess(text) ⇒ `String`

#user_prompt ⇒ `String`⁺