Module: Parse::Embeddings::SpendCap

Defined in:
lib/parse/embeddings/spend_cap.rb

Overview

Per-tenant cumulative embedding spend cap.

The agent semantic_search tool embeds attacker-controlled text (chat queries) on every call. Without a cap, a tenant — or an adversary driving an agent — can run up unbounded embedding-provider cost. SpendCap tracks the cumulative number of tokens embedded per tenant inside a sliding time window and HARD-REFUSES (raises Exceeded) once a tenant would exceed its limit. This is distinct from Agent::RateLimiter, which bounds request count per window; the spend cap bounds embedding volume (a proxy for cost).

== Disabled by default

With no configured limit the cap is a no-op — SpendCap.charge! records nothing and never raises. Operators opt in:

Parse::Embeddings::SpendCap.configure(limit_tokens: 1_000_000, window: 3600) Parse::Embeddings::SpendCap.configure(:acme_tenant, limit_tokens: 50_000)

A per-tenant limit (second form) overrides the default for that tenant. The reserved key DEFAULT_KEY sets the fallback applied to every tenant without an explicit limit.

== Token estimation

Callers pass an explicit token count, or use SpendCap.estimate_tokens (a chars/4 heuristic — the same approximation the agent layer uses for its context-token budgets). The cap is intentionally an estimate: it exists to bound runaway cost, not to bill precisely.

Thread-safe: all state lives behind a single mutex.

Defined Under Namespace

Classes: Exceeded

Constant Summary collapse

DEFAULT_KEY =

Fallback bucket key for charges with no tenant identity, and the key under which configure (with no explicit tenant) sets the default limit applied to every tenant lacking an override.

:__default__
DEFAULT_WINDOW =

Default sliding window (seconds) when none is configured.

3600

Class Method Summary collapse

Class Method Details

.charge!(tenant_id:, tokens:) ⇒ Integer?

Charge tokens against tenant_id's budget. HARD-REFUSES by raising Exceeded when the charge would push the tenant over its limit within the window; otherwise records the charge and returns the new in-window total.

No-op (returns nil) when no limit applies to the tenant.

Parameters:

  • tenant_id (Object, nil)

    tenant identity (nil → DEFAULT_KEY).

  • tokens (Integer)

    tokens to charge (>= 0).

Returns:

  • (Integer, nil)

    new in-window total, or nil if uncapped.

Raises:



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/parse/embeddings/spend_cap.rb', line 109

def charge!(tenant_id:, tokens:)
  t = Integer(tokens)
  raise ArgumentError, "SpendCap: tokens must be >= 0 (got #{t})." if t.negative?
  key = tenant_id.nil? ? DEFAULT_KEY : tenant_id

  mutex.synchronize do
    cfg = limit_for(key)
    return nil if cfg.nil? # uncapped

    window = cfg[:window]
    limit = cfg[:limit]
    now = monotonic
    entries = prune(key, now, window)
    used = entries.sum { |e| e[1] }

    if used + t > limit
      raise Exceeded.new(
        tenant_id: key, limit: limit, used: used, requested: t,
        window: window, retry_after: retry_after_for(entries, t, limit, window, now),
      )
    end
    entries << [now, t] if t.positive?
    used + t
  end
end

.configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW)

This method returns an undefined value.

Configure the cap. Two forms:

configure(limit_tokens:, window:) # default for all tenants configure(tenant_id, limit_tokens:, window:) # override one tenant

limit_tokens: nil disables the cap for that scope (the default scope when no tenant is given).

Parameters:

  • tenant_id (Object, nil) (defaults to: nil)

    tenant to override, or nil for the global default.

  • limit_tokens (Integer, nil)

    token ceiling per window.

  • window (Integer) (defaults to: DEFAULT_WINDOW)

    sliding window length in seconds.

Raises:

  • (ArgumentError)


84
85
86
87
88
89
90
91
92
93
94
95
96
# File 'lib/parse/embeddings/spend_cap.rb', line 84

def configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW)
  key = tenant_id.nil? ? DEFAULT_KEY : tenant_id
  unless limit_tokens.nil?
    li = Integer(limit_tokens)
    raise ArgumentError, "SpendCap: limit_tokens must be positive (got #{li})." if li <= 0
  end
  w = Integer(window)
  raise ArgumentError, "SpendCap: window must be positive (got #{w})." if w <= 0
  mutex.synchronize do
    limits[key] = limit_tokens.nil? ? nil : { limit: Integer(limit_tokens), window: w }
  end
  nil
end

.estimate_tokens(text) ⇒ Integer

Estimate token count from a String.

The familiar "~4 characters per token" ratio only holds for ASCII. CJK, emoji, and other multibyte text run closer to one token per codepoint in a real tokenizer, so a pure chars / 4 estimate undercounts such input by up to ~4x — and since this estimate is the sole basis for the hard-refuse decision, that lets a caller feeding multibyte text reach ~4x the real embedding volume before the cap trips. Take the larger of the char-based and byte-based estimates so multibyte input bills at least as much as its UTF-8 byte length implies.

Parameters:

Returns:

  • (Integer)


163
164
165
166
167
168
# File 'lib/parse/embeddings/spend_cap.rb', line 163

def estimate_tokens(text)
  str = text.to_s
  chars = (str.length / 4.0).ceil
  bytes = (str.bytesize / 4.0).ceil
  [chars, bytes].max
end

.reset!(tenant_id = nil) ⇒ Object

Clear recorded usage (all tenants, or one). Limits are retained.

Parameters:

  • tenant_id (Object, nil) (defaults to: nil)


173
174
175
176
177
178
179
180
181
182
# File 'lib/parse/embeddings/spend_cap.rb', line 173

def reset!(tenant_id = nil)
  mutex.synchronize do
    if tenant_id.nil?
      @buckets = {}
    else
      buckets.delete(tenant_id)
    end
  end
  nil
end

.reset_all!Object

Remove all configured limits AND recorded usage. Mainly for tests — returns the cap to its disabled-by-default state.



186
187
188
189
190
191
192
# File 'lib/parse/embeddings/spend_cap.rb', line 186

def reset_all!
  mutex.synchronize do
    @limits = {}
    @buckets = {}
  end
  nil
end

.usage(tenant_id: nil) ⇒ Integer

Current in-window token usage for a tenant (0 when uncapped or idle). Does not mutate.

Parameters:

  • tenant_id (Object, nil) (defaults to: nil)

Returns:

  • (Integer)


140
141
142
143
144
145
146
147
# File 'lib/parse/embeddings/spend_cap.rb', line 140

def usage(tenant_id: nil)
  key = tenant_id.nil? ? DEFAULT_KEY : tenant_id
  mutex.synchronize do
    cfg = limit_for(key)
    return 0 if cfg.nil?
    prune(key, monotonic, cfg[:window]).sum { |e| e[1] }
  end
end