Class: RubyLLM::Agents::Embedder
- Defined in:
- lib/ruby_llm/agents/text/embedder.rb
Overview
Base class for creating embedding generators
Embedder inherits from BaseAgent and uses the middleware pipeline for caching, reliability, instrumentation, and budget controls. Only the core embedding logic is implemented here.
Constant Summary
Constants included from DSL::Base
DSL::Base::PLACEHOLDER_PATTERN
Constants included from DSL::Caching
DSL::Caching::DEFAULT_CACHE_TTL
Constants included from CacheHelper
Instance Attribute Summary collapse
-
#text ⇒ String?
readonly
Single text to embed.
- #texts ⇒ Object readonly
Attributes inherited from BaseAgent
#client, #model, #temperature, #tracked_tool_calls
Embedding-specific DSL collapse
-
.batch_size(value = nil) ⇒ Integer
Sets or returns the batch size.
-
.dimensions(value = nil) ⇒ Integer?
Sets or returns the vector dimensions.
-
.model(value = nil) ⇒ String
Sets or returns the embedding model.
Class Method Summary collapse
-
.agent_type ⇒ Symbol
Returns the agent type for embedders.
-
.call(text: nil, texts: nil, **options) {|batch_result, index| ... } ⇒ EmbeddingResult
Executes the embedder with the given parameters.
Instance Method Summary collapse
-
#agent_cache_key ⇒ String
Generates the cache key for this embedding.
-
#call {|batch_result, index| ... } ⇒ EmbeddingResult
Executes the embedding through the middleware pipeline.
-
#execute(context) ⇒ void
Core embedding execution.
-
#initialize(text: nil, texts: nil, **options) ⇒ Embedder
constructor
Creates a new Embedder instance.
-
#preprocess(text) ⇒ String
Preprocesses text before embedding.
-
#user_prompt ⇒ String+
The input for this embedding operation.
Methods inherited from BaseAgent
agent_middleware, aliases, all_agent_names, ask, #assistant_prompt, #cache_key_data, #cache_key_hash, config_summary, #messages, param, params, #process_response, #resolved_thinking, #schema, stream, streaming, #system_prompt, temperature, thinking, thinking_config, tools, use_middleware
Methods included from DSL::Base
#active_overrides, #assistant, #assistant_config, #cache_prompts, #clear_override_cache!, #description, #model, #overridable?, #overridable_fields, #prompt, #returns, #schema, #system, #system_config, #timeout, #user, #user_config
Methods included from DSL::Reliability
#circuit_breaker, #circuit_breaker_config, #fallback_models, #fallback_provider, #fallback_providers, #non_fallback_errors, #on_failure, #reliability, #reliability_config, #reliability_configured?, #retries, #retries_config, #retryable_patterns, #total_timeout
Methods included from DSL::Caching
#cache, #cache_enabled?, #cache_for, #cache_key_excludes, #cache_key_includes, #cache_ttl, #caching_config
Methods included from DSL::Queryable
#cost_by_model, #executions, #failures, #last_run, #stats, #total_spent, #with_params
Methods included from DSL::Knowledge
#knowledge_entries, #knowledge_path, #knows
Methods included from CacheHelper
#cache_delete, #cache_exist?, #cache_increment, #cache_key, #cache_read, #cache_store, #cache_write
Methods included from DSL::Knowledge::InstanceMethods
Constructor Details
#initialize(text: nil, texts: nil, **options) ⇒ Embedder
Creates a new Embedder instance
148 149 150 151 152 153 154 155 156 157 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 148 def initialize(text: nil, texts: nil, **) @text = text @texts = texts @batch_block = nil # Set model to embedding model if not specified [:model] ||= self.class.model || self.class.class_eval { } super(**) end |
Instance Attribute Details
#text ⇒ String? (readonly)
Returns Single text to embed.
141 142 143 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 141 def text @text end |
#texts ⇒ Object (readonly)
141 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 141 attr_reader :text, :texts |
Class Method Details
.agent_type ⇒ Symbol
Returns the agent type for embedders
41 42 43 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 41 def agent_type :embedding end |
.batch_size(value = nil) ⇒ Integer
Sets or returns the batch size
When embedding multiple texts, they are split into batches of this size for API calls.
91 92 93 94 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 91 def batch_size(value = nil) @batch_size = value if value @batch_size || inherited_or_default(:batch_size, ) end |
.call(text: nil, texts: nil, **options) {|batch_result, index| ... } ⇒ EmbeddingResult
Executes the embedder with the given parameters
108 109 110 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 108 def call(text: nil, texts: nil, **, &block) new(text: text, texts: texts, **).call(&block) end |
.dimensions(value = nil) ⇒ Integer?
Sets or returns the vector dimensions
Some models (like OpenAI text-embedding-3) support reducing dimensions for more efficient storage.
77 78 79 80 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 77 def dimensions(value = nil) @dimensions = value if value @dimensions || inherited_or_default(:dimensions, ) end |
.model(value = nil) ⇒ String
Sets or returns the embedding model
Defaults to the embedding model from configuration, not the conversation model that BaseAgent uses.
56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 56 def model(value = nil) @model = value if value return @model if defined?(@model) && @model # For inheritance: check if parent is also an Embedder if superclass.respond_to?(:agent_type) && superclass.agent_type == :embedding superclass.model else end end |
Instance Method Details
#agent_cache_key ⇒ String
Generates the cache key for this embedding
256 257 258 259 260 261 262 263 264 265 266 267 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 256 def agent_cache_key components = [ "ruby_llm_agents", "embedding", self.class.name, resolved_model, resolved_dimensions, Digest::SHA256.hexdigest(input_texts.map { |t| preprocess(t) }.join("\n")) ].compact components.join("/") end |
#call {|batch_result, index| ... } ⇒ EmbeddingResult
Executes the embedding through the middleware pipeline
163 164 165 166 167 168 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 163 def call(&block) @batch_block = block context = build_context result_context = Pipeline::Executor.execute(context) result_context.output end |
#execute(context) ⇒ void
This method returns an undefined value.
Core embedding execution
This is called by the Pipeline::Executor after middleware has been applied. Only contains the embedding API logic.
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 201 def execute(context) # Track timing internally since middleware sets completed_at after execute returns execution_started_at = Time.current input_list = input_texts validate_input!(input_list) all_vectors = [] total_input_tokens = 0 total_cost = 0.0 batch_count = resolved_batch_size batches = input_list.each_slice(batch_count).to_a batches.each_with_index do |batch, index| batch_result = execute_batch(batch, context) all_vectors.concat(batch_result[:vectors]) total_input_tokens += batch_result[:input_tokens] || 0 total_cost += batch_result[:cost] || 0.0 # Yield batch result for progress tracking if @batch_block = build_batch_result(batch_result, batch.size) @batch_block.call(, index) end end execution_completed_at = Time.current duration_ms = ((execution_completed_at - execution_started_at) * 1000).to_i # Update context with token/cost info context.input_tokens = total_input_tokens context.output_tokens = 0 context.input_cost = total_cost context.output_cost = 0.0 context.total_cost = total_cost.round(6) # Build final result context.output = build_result( vectors: all_vectors, input_tokens: total_input_tokens, total_cost: total_cost, count: input_list.size, started_at: context.started_at || execution_started_at, completed_at: execution_completed_at, duration_ms: duration_ms, tenant_id: context.tenant_id, execution_id: context.execution_id ) end |
#preprocess(text) ⇒ String
Preprocesses text before embedding
Override this method in subclasses to apply custom preprocessing like normalization, cleaning, or truncation.
190 191 192 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 190 def preprocess(text) text end |
#user_prompt ⇒ String+
The input for this embedding operation
Used by the pipeline to generate cache keys and for instrumentation.
175 176 177 |
# File 'lib/ruby_llm/agents/text/embedder.rb', line 175 def user_prompt input_texts.join("\n---\n") end |