Class: Woods::CostModel::Estimator

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/cost_model/estimator.rb

Overview

Unified cost estimator that combines embedding, storage, and query costs into a single breakdown for a given configuration.

Examples:

estimate = Estimator.new(
  units: 500,
  chunk_multiplier: 2.5,
  embedding_provider: :openai_small,
  dimensions: 1536,
  daily_queries: 100
)
estimate.full_index_cost    # => 0.01125
estimate.monthly_query_cost # => 0.006
estimate.storage_bytes      # => 9_984_000
estimate.to_h               # => { full_index_cost: ..., ... }

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(units:, embedding_provider:, chunk_multiplier: 2.5, dimensions: nil, daily_queries: 100) ⇒ Estimator

Returns a new instance of Estimator.

Parameters:

  • units (Integer)

    Number of extracted units

  • chunk_multiplier (Float) (defaults to: 2.5)

    Average chunks per unit (default 2.5)

  • embedding_provider (Symbol)

    Provider key from ProviderPricing

  • dimensions (Integer, nil) (defaults to: nil)

    Vector dimensions (defaults to provider default)

  • daily_queries (Integer) (defaults to: 100)

    Retrieval queries per day (default 100)



42
43
44
45
46
47
48
49
50
51
# File 'lib/woods/cost_model/estimator.rb', line 42

def initialize(units:, embedding_provider:, chunk_multiplier: 2.5, dimensions: nil, daily_queries: 100)
  @units = units
  @chunk_multiplier = chunk_multiplier
  @embedding_provider = embedding_provider
  @dimensions = dimensions || ProviderPricing.default_dimensions(embedding_provider)
  @daily_queries = daily_queries

  @embedding_cost = EmbeddingCost.new(provider: embedding_provider)
  @storage_cost = StorageCost.new(dimensions: @dimensions)
end

Instance Attribute Details

#chunk_multiplierFloat (readonly)

Returns Average chunks per unit.

Returns:

  • (Float)

    Average chunks per unit



26
27
28
# File 'lib/woods/cost_model/estimator.rb', line 26

def chunk_multiplier
  @chunk_multiplier
end

#daily_queriesInteger (readonly)

Returns Number of retrieval queries per day.

Returns:

  • (Integer)

    Number of retrieval queries per day



35
36
37
# File 'lib/woods/cost_model/estimator.rb', line 35

def daily_queries
  @daily_queries
end

#dimensionsInteger (readonly)

Returns Embedding vector dimensions.

Returns:

  • (Integer)

    Embedding vector dimensions



32
33
34
# File 'lib/woods/cost_model/estimator.rb', line 32

def dimensions
  @dimensions
end

#embedding_providerSymbol (readonly)

Returns Embedding provider key.

Returns:

  • (Symbol)

    Embedding provider key



29
30
31
# File 'lib/woods/cost_model/estimator.rb', line 29

def embedding_provider
  @embedding_provider
end

#unitsInteger (readonly)

Returns Number of extracted units.

Returns:

  • (Integer)

    Number of extracted units



23
24
25
# File 'lib/woods/cost_model/estimator.rb', line 23

def units
  @units
end

Instance Method Details

#full_index_costFloat

Cost to embed the full codebase index.

Returns:

  • (Float)

    Cost in USD



56
57
58
# File 'lib/woods/cost_model/estimator.rb', line 56

def full_index_cost
  @embedding_cost.full_index_cost(units: units, chunk_multiplier: chunk_multiplier)
end

#incremental_per_merge_cost(changed_units: 5) ⇒ Float

Cost to re-embed a single merge (default 5 changed units).

Parameters:

  • changed_units (Integer) (defaults to: 5)

    Units changed per merge (default 5)

Returns:

  • (Float)

    Cost in USD



64
65
66
# File 'lib/woods/cost_model/estimator.rb', line 64

def incremental_per_merge_cost(changed_units: 5)
  @embedding_cost.incremental_cost(changed_units: changed_units, chunk_multiplier: chunk_multiplier)
end

#monthly_query_costFloat

Monthly cost for query-time embedding.

Returns:

  • (Float)

    Cost in USD per month



71
72
73
# File 'lib/woods/cost_model/estimator.rb', line 71

def monthly_query_cost
  @embedding_cost.monthly_query_cost(daily_queries: daily_queries)
end

#storage_bytesInteger

Total storage in bytes for vector data.

Returns:

  • (Integer)


96
97
98
# File 'lib/woods/cost_model/estimator.rb', line 96

def storage_bytes
  @storage_cost.storage_bytes(chunks: total_chunks)
end

#storage_mbFloat

Total storage in megabytes for vector data.

Returns:

  • (Float)


103
104
105
# File 'lib/woods/cost_model/estimator.rb', line 103

def storage_mb
  @storage_cost.storage_mb(chunks: total_chunks)
end

#to_hHash{Symbol => Numeric}

Full cost breakdown as a Hash.

Returns:

  • (Hash{Symbol => Numeric})


110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/woods/cost_model/estimator.rb', line 110

def to_h
  {
    full_index_cost: full_index_cost,
    incremental_per_merge_cost: incremental_per_merge_cost,
    monthly_query_cost: monthly_query_cost,
    yearly_incremental_cost: yearly_incremental_cost,
    storage_bytes: storage_bytes,
    storage_mb: storage_mb,
    total_chunks: total_chunks,
    units: units,
    chunk_multiplier: chunk_multiplier,
    embedding_provider: embedding_provider,
    dimensions: dimensions,
    daily_queries: daily_queries
  }
end

#total_chunksInteger

Total number of chunks for the codebase.

Returns:

  • (Integer)


89
90
91
# File 'lib/woods/cost_model/estimator.rb', line 89

def total_chunks
  @total_chunks ||= (units * chunk_multiplier).ceil
end

#yearly_incremental_cost(merges_per_year: 2400) ⇒ Float

Yearly embedding cost from incremental re-indexing.

Parameters:

  • merges_per_year (Integer) (defaults to: 2400)

    Merges per year (default 2400)

Returns:

  • (Float)

    Cost in USD per year



79
80
81
82
83
84
# File 'lib/woods/cost_model/estimator.rb', line 79

def yearly_incremental_cost(merges_per_year: 2400)
  @embedding_cost.yearly_incremental_cost(
    merges_per_year: merges_per_year,
    chunk_multiplier: chunk_multiplier
  )
end