Module: Parse::Embeddings::SpendCap

Defined in:
lib/parse/embeddings/spend_cap.rb

Overview

Per-tenant cumulative embedding spend cap.

The agent semantic_search tool embeds attacker-controlled text (chat queries) on every call. Without a cap, a tenant — or an adversary driving an agent — can run up unbounded embedding-provider cost. SpendCap tracks the cumulative number of tokens embedded per tenant inside a sliding time window and HARD-REFUSES (raises Exceeded) once a tenant would exceed its limit. This is distinct from Agent::RateLimiter, which bounds request count per window; the spend cap bounds embedding volume (a proxy for cost).

== Disabled by default

With no configured limit the cap is a no-op — SpendCap.charge! records nothing and never raises. Operators opt in:

Parse::Embeddings::SpendCap.configure(limit_tokens: 1_000_000, window: 3600) Parse::Embeddings::SpendCap.configure(:acme_tenant, limit_tokens: 50_000)

A per-tenant limit (second form) overrides the default for that tenant. The reserved key DEFAULT_KEY sets the fallback applied to every tenant without an explicit limit.

== Token estimation

Callers pass an explicit token count, or use SpendCap.estimate_tokens (a chars/4 heuristic — the same approximation the agent layer uses for its context-token budgets). The cap is intentionally an estimate: it exists to bound runaway cost, not to bill precisely.

Thread-safe: all state lives behind a single mutex.

Defined Under Namespace

Classes: Exceeded

Constant Summary collapse

DEFAULT_KEY =

Fallback bucket key for charges with no tenant identity, and the key under which configure (with no explicit tenant) sets the default limit applied to every tenant lacking an override.

:__default__
PRECHARGED_KEY =

Thread-local key marking that the current call stack has already charged the spend cap (or deliberately exempted itself). Set by with_precharged; read by charge_query! so the inner query-embed paths (find_similar(text:), hybrid_search, Parse::Retrieval.retrieve) don't double-bill a query the agent tool already charged with proper tenant identity.

:parse_embed_spend_precharged
DEFAULT_WINDOW =

Default sliding window (seconds) when none is configured.

3600
AS_NOTIFICATION_NAME =

AS::N event emitted when a tenant's in-window usage crosses the configured warn_at: fraction of its hard limit. Payload: { tenant_id:, used:, limit:, window:, warn_at:, threshold: }. Emitted once per window-crossing (re-arms as usage rolls off), never on the hard-refuse itself (that raises Exceeded).

"parse.embeddings.spend_cap_warning"

Class Method Summary collapse

Class Method Details

.charge!(tenant_id:, tokens:) ⇒ Integer?

Charge tokens against tenant_id's budget. HARD-REFUSES by raising Exceeded when the charge would push the tenant over its limit within the window; otherwise records the charge and returns the new in-window total.

No-op (returns nil) when no limit applies to the tenant.

Parameters:

  • tenant_id (Object, nil)

    tenant identity (nil → DEFAULT_KEY).

  • tokens (Integer)

    tokens to charge (>= 0).

Returns:

  • (Integer, nil)

    new in-window total, or nil if uncapped.

Raises:



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# File 'lib/parse/embeddings/spend_cap.rb', line 144

def charge!(tenant_id:, tokens:)
  t = Integer(tokens)
  raise ArgumentError, "SpendCap: tokens must be >= 0 (got #{t})." if t.negative?
  key = tenant_id.nil? ? DEFAULT_KEY : tenant_id

  warn_payload = nil
  total = mutex.synchronize do
    cfg = limit_for(key)
    return nil if cfg.nil? # uncapped

    window = cfg[:window]
    limit = cfg[:limit]
    now = monotonic
    entries = prune(key, now, window)
    used = entries.sum { |e| e[1] }

    if used + t > limit
      raise Exceeded.new(
        tenant_id: key, limit: limit, used: used, requested: t,
        window: window, retry_after: retry_after_for(entries, t, limit, window, now),
      )
    end
    entries << [now, t] if t.positive?
    # Soft-cap crossing: fire only when THIS charge moves usage
    # from below the threshold to at-or-above it, so a tenant
    # hovering over the line doesn't spam an event per charge.
    # Pruned entries re-arm the warning naturally as the window
    # rolls off.
    if (wa = cfg[:warn_at])
      threshold = limit * wa
      if used < threshold && used + t >= threshold
        warn_payload = {
          tenant_id: key, used: used + t, limit: limit,
          window: window, warn_at: wa, threshold: threshold,
        }
      end
    end
    used + t
  end
  emit_soft_cap_warning(warn_payload) if warn_payload
  total
end

.charge_query!(text, tenant_id: nil) ⇒ Integer?

Charge a query-embed against the cap from a non-agent path. This is the v5.5 closure of "spend-cap coverage on all embed paths": find_similar(text:), hybrid_search(text:), and Parse::Retrieval.retrieve route their query text through here before embedding.

Parameters:

  • text (String)

    the query text about to be embedded.

  • tenant_id (Object, nil) (defaults to: nil)

    explicit tenant identity; nil resolves the ambient cache tenant, then DEFAULT_KEY.

Returns:

  • (Integer, nil)

    new in-window total, or nil when uncapped / precharged.

Raises:



243
244
245
246
247
248
249
# File 'lib/parse/embeddings/spend_cap.rb', line 243

def charge_query!(text, tenant_id: nil)
  return nil if precharged?
  if tenant_id.nil? && defined?(Parse) && Parse.respond_to?(:current_cache_tenant)
    tenant_id = Parse.current_cache_tenant
  end
  charge!(tenant_id: tenant_id, tokens: estimate_tokens(text))
end

.configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW, warn_at: nil)

This method returns an undefined value.

Configure the cap. Two forms:

configure(limit_tokens:, window:) # default for all tenants configure(tenant_id, limit_tokens:, window:) # override one tenant

limit_tokens: nil disables the cap for that scope (the default scope when no tenant is given).

Parameters:

  • tenant_id (Object, nil) (defaults to: nil)

    tenant to override, or nil for the global default.

  • limit_tokens (Integer, nil)

    token ceiling per window.

  • window (Integer) (defaults to: DEFAULT_WINDOW)

    sliding window length in seconds.

  • warn_at (Numeric, nil) (defaults to: nil)

    soft-cap fraction of limit_tokens (exclusive 0...1). When a charge pushes a tenant's in-window usage across limit * warn_at, a AS_NOTIFICATION_NAME ActiveSupport::Notifications event is emitted (once per crossing — re-arms as the window rolls off). Gives operators an alerting hook BEFORE the hard refuse trips. nil (default) disables the soft cap.

Raises:

  • (ArgumentError)


106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/parse/embeddings/spend_cap.rb', line 106

def configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW, warn_at: nil)
  key = tenant_id.nil? ? DEFAULT_KEY : tenant_id
  unless limit_tokens.nil?
    li = Integer(limit_tokens)
    raise ArgumentError, "SpendCap: limit_tokens must be positive (got #{li})." if li <= 0
  end
  w = Integer(window)
  raise ArgumentError, "SpendCap: window must be positive (got #{w})." if w <= 0
  unless warn_at.nil?
    wa = Float(warn_at)
    unless wa > 0.0 && wa < 1.0
      raise ArgumentError, "SpendCap: warn_at must be between 0 and 1 exclusive (got #{warn_at})."
    end
  end
  mutex.synchronize do
    limits[key] =
      if limit_tokens.nil?
        nil
      else
        cfg = { limit: Integer(limit_tokens), window: w }
        cfg[:warn_at] = Float(warn_at) unless warn_at.nil?
        cfg
      end
  end
  nil
end

.estimate_tokens(text) ⇒ Integer

Estimate token count from a String.

The familiar "~4 characters per token" ratio only holds for ASCII. CJK, emoji, and other multibyte text run closer to one token per codepoint in a real tokenizer, so a pure chars / 4 estimate undercounts such input by up to ~4x — and since this estimate is the sole basis for the hard-refuse decision, that lets a caller feeding multibyte text reach ~4x the real embedding volume before the cap trips. Take the larger of the char-based and byte-based estimates so multibyte input bills at least as much as its UTF-8 byte length implies.

Parameters:

Returns:

  • (Integer)


265
266
267
268
269
270
# File 'lib/parse/embeddings/spend_cap.rb', line 265

def estimate_tokens(text)
  str = text.to_s
  chars = (str.length / 4.0).ceil
  bytes = (str.bytesize / 4.0).ceil
  [chars, bytes].max
end

.precharged?Boolean

Returns whether the current call stack is inside with_precharged.

Returns:



220
221
222
# File 'lib/parse/embeddings/spend_cap.rb', line 220

def precharged?
  !!Thread.current[PRECHARGED_KEY]
end

.reset!(tenant_id = nil) ⇒ Object

Clear recorded usage (all tenants, or one). Limits are retained.

Parameters:

  • tenant_id (Object, nil) (defaults to: nil)


275
276
277
278
279
280
281
282
283
284
# File 'lib/parse/embeddings/spend_cap.rb', line 275

def reset!(tenant_id = nil)
  mutex.synchronize do
    if tenant_id.nil?
      @buckets = {}
    else
      buckets.delete(tenant_id)
    end
  end
  nil
end

.reset_all!Object

Remove all configured limits AND recorded usage. Mainly for tests — returns the cap to its disabled-by-default state.



288
289
290
291
292
293
294
# File 'lib/parse/embeddings/spend_cap.rb', line 288

def reset_all!
  mutex.synchronize do
    @limits = {}
    @buckets = {}
  end
  nil
end

.usage(tenant_id: nil) ⇒ Integer

Current in-window token usage for a tenant (0 when uncapped or idle). Does not mutate.

Parameters:

  • tenant_id (Object, nil) (defaults to: nil)

Returns:

  • (Integer)


192
193
194
195
196
197
198
199
# File 'lib/parse/embeddings/spend_cap.rb', line 192

def usage(tenant_id: nil)
  key = tenant_id.nil? ? DEFAULT_KEY : tenant_id
  mutex.synchronize do
    cfg = limit_for(key)
    return 0 if cfg.nil?
    prune(key, monotonic, cfg[:window]).sum { |e| e[1] }
  end
end

.with_prechargedObject

Run a block with the inner query-embed charge suppressed. Callers that have ALREADY charged the cap with better tenant identity (the semantic_search agent tool charges per-tenant before calling retrieve) — or that deliberately exempt the call (trusted admin agents) — wrap their downstream embed in this so charge_query! inside find_similar / retrieve is a no-op. Restores the prior flag on exit (nesting-safe).

Returns:

  • (Object)

    the block's return value.



210
211
212
213
214
215
216
# File 'lib/parse/embeddings/spend_cap.rb', line 210

def with_precharged
  prev = Thread.current[PRECHARGED_KEY]
  Thread.current[PRECHARGED_KEY] = true
  yield
ensure
  Thread.current[PRECHARGED_KEY] = prev
end