Class: Woods::CostModel::EmbeddingCost

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/cost_model/embedding_cost.rb

Overview

Calculates embedding costs for full-index, incremental, and query-time scenarios using the token-based pricing from ProviderPricing.

The cost model uses a constant of 450 tokens per chunk, derived from the BACKEND_MATRIX.md tables (e.g. 500 units × 2.5 chunks = 1250 chunks × 450 = 562K tokens).

Examples:

calc = EmbeddingCost.new(provider: :openai_small)
calc.full_index_cost(units: 500, chunk_multiplier: 2.5) # => 0.01125

Constant Summary collapse

TOKENS_PER_CHUNK =

Average tokens per chunk after hierarchical chunking with context prefix.

450
TOKENS_PER_QUERY =

Average tokens per retrieval query.

100

Instance Method Summary collapse

Constructor Details

#initialize(provider:) ⇒ EmbeddingCost

Returns a new instance of EmbeddingCost.

Parameters:



23
24
25
# File 'lib/woods/cost_model/embedding_cost.rb', line 23

def initialize(provider:)
  @cost_per_million = ProviderPricing.cost_per_million(provider)
end

Instance Method Details

#full_index_cost(units:, chunk_multiplier: 2.5) ⇒ Float

Cost to embed the full codebase index.

Parameters:

  • units (Integer)

    Number of extracted units

  • chunk_multiplier (Float) (defaults to: 2.5)

    Average chunks per unit (default 2.5)

Returns:

  • (Float)

    Cost in USD



32
33
34
35
# File 'lib/woods/cost_model/embedding_cost.rb', line 32

def full_index_cost(units:, chunk_multiplier: 2.5)
  tokens = total_tokens(units, chunk_multiplier)
  token_cost(tokens)
end

#incremental_cost(changed_units: 5, chunk_multiplier: 2.5) ⇒ Float

Cost to re-embed changed units from a single merge.

Parameters:

  • changed_units (Integer) (defaults to: 5)

    Number of units changed (default 5)

  • chunk_multiplier (Float) (defaults to: 2.5)

    Average chunks per unit (default 2.5)

Returns:

  • (Float)

    Cost in USD



42
43
44
45
# File 'lib/woods/cost_model/embedding_cost.rb', line 42

def incremental_cost(changed_units: 5, chunk_multiplier: 2.5)
  tokens = total_tokens(changed_units, chunk_multiplier)
  token_cost(tokens)
end

#monthly_query_cost(daily_queries:) ⇒ Float

Monthly cost for query-time embedding.

Parameters:

  • daily_queries (Integer)

    Number of queries per day

Returns:

  • (Float)

    Cost in USD per month



51
52
53
54
# File 'lib/woods/cost_model/embedding_cost.rb', line 51

def monthly_query_cost(daily_queries:)
  monthly_tokens = daily_queries * 30 * TOKENS_PER_QUERY
  token_cost(monthly_tokens)
end

#total_tokens(units, chunk_multiplier) ⇒ Integer

Total tokens for a given number of units and chunk multiplier.

Parameters:

  • units (Integer)

    Number of units

  • chunk_multiplier (Float)

    Chunks per unit

Returns:

  • (Integer)

    Total embedding tokens



72
73
74
75
# File 'lib/woods/cost_model/embedding_cost.rb', line 72

def total_tokens(units, chunk_multiplier)
  chunks = (units * chunk_multiplier).ceil
  chunks * TOKENS_PER_CHUNK
end

#yearly_incremental_cost(merges_per_year: 2400, changed_units_per_merge: 5, chunk_multiplier: 2.5) ⇒ Float

Yearly embedding cost from incremental re-indexing.

Parameters:

  • merges_per_year (Integer) (defaults to: 2400)

    Number of merges per year (default 2400)

  • changed_units_per_merge (Integer) (defaults to: 5)

    Units changed per merge (default 5)

  • chunk_multiplier (Float) (defaults to: 2.5)

    Average chunks per unit (default 2.5)

Returns:

  • (Float)

    Cost in USD per year



62
63
64
65
# File 'lib/woods/cost_model/embedding_cost.rb', line 62

def yearly_incremental_cost(merges_per_year: 2400, changed_units_per_merge: 5, chunk_multiplier: 2.5)
  tokens_per_merge = total_tokens(changed_units_per_merge, chunk_multiplier)
  token_cost(tokens_per_merge * merges_per_year)
end