Module: Parse::Embeddings::SpendCap
- Defined in:
- lib/parse/embeddings/spend_cap.rb
Overview
Per-tenant cumulative embedding spend cap.
The agent semantic_search tool embeds attacker-controlled text
(chat queries) on every call. Without a cap, a tenant — or an
adversary driving an agent — can run up unbounded embedding-provider
cost. SpendCap tracks the cumulative number of tokens embedded
per tenant inside a sliding time window and HARD-REFUSES (raises
Exceeded) once a tenant would exceed its limit. This is distinct
from Agent::RateLimiter, which bounds request count per
window; the spend cap bounds embedding volume (a proxy for cost).
== Disabled by default
With no configured limit the cap is a no-op — SpendCap.charge! records nothing and never raises. Operators opt in:
Parse::Embeddings::SpendCap.configure(limit_tokens: 1_000_000, window: 3600) Parse::Embeddings::SpendCap.configure(:acme_tenant, limit_tokens: 50_000)
A per-tenant limit (second form) overrides the default for that tenant. The reserved key DEFAULT_KEY sets the fallback applied to every tenant without an explicit limit.
== Token estimation
Callers pass an explicit token count, or use SpendCap.estimate_tokens (a chars/4 heuristic — the same approximation the agent layer uses for its context-token budgets). The cap is intentionally an estimate: it exists to bound runaway cost, not to bill precisely.
Thread-safe: all state lives behind a single mutex.
Defined Under Namespace
Classes: Exceeded
Constant Summary collapse
- DEFAULT_KEY =
Fallback bucket key for charges with no tenant identity, and the key under which configure (with no explicit tenant) sets the default limit applied to every tenant lacking an override.
:__default__- DEFAULT_WINDOW =
Default sliding window (seconds) when none is configured.
3600
Class Method Summary collapse
-
.charge!(tenant_id:, tokens:) ⇒ Integer?
Charge
tokensagainsttenant_id's budget. -
.configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW)
Configure the cap.
-
.estimate_tokens(text) ⇒ Integer
Estimate token count from a String.
-
.reset!(tenant_id = nil) ⇒ Object
Clear recorded usage (all tenants, or one).
-
.reset_all! ⇒ Object
Remove all configured limits AND recorded usage.
-
.usage(tenant_id: nil) ⇒ Integer
Current in-window token usage for a tenant (0 when uncapped or idle).
Class Method Details
.charge!(tenant_id:, tokens:) ⇒ Integer?
Charge tokens against tenant_id's budget. HARD-REFUSES by
raising Exceeded when the charge would push the tenant over
its limit within the window; otherwise records the charge and
returns the new in-window total.
No-op (returns nil) when no limit applies to the tenant.
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/parse/embeddings/spend_cap.rb', line 109 def charge!(tenant_id:, tokens:) t = Integer(tokens) raise ArgumentError, "SpendCap: tokens must be >= 0 (got #{t})." if t.negative? key = tenant_id.nil? ? DEFAULT_KEY : tenant_id mutex.synchronize do cfg = limit_for(key) return nil if cfg.nil? # uncapped window = cfg[:window] limit = cfg[:limit] now = monotonic entries = prune(key, now, window) used = entries.sum { |e| e[1] } if used + t > limit raise Exceeded.new( tenant_id: key, limit: limit, used: used, requested: t, window: window, retry_after: retry_after_for(entries, t, limit, window, now), ) end entries << [now, t] if t.positive? used + t end end |
.configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW)
This method returns an undefined value.
Configure the cap. Two forms:
configure(limit_tokens:, window:) # default for all tenants configure(tenant_id, limit_tokens:, window:) # override one tenant
limit_tokens: nil disables the cap for that scope (the default
scope when no tenant is given).
84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/parse/embeddings/spend_cap.rb', line 84 def configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW) key = tenant_id.nil? ? DEFAULT_KEY : tenant_id unless limit_tokens.nil? li = Integer(limit_tokens) raise ArgumentError, "SpendCap: limit_tokens must be positive (got #{li})." if li <= 0 end w = Integer(window) raise ArgumentError, "SpendCap: window must be positive (got #{w})." if w <= 0 mutex.synchronize do limits[key] = limit_tokens.nil? ? nil : { limit: Integer(limit_tokens), window: w } end nil end |
.estimate_tokens(text) ⇒ Integer
Estimate token count from a String.
The familiar "~4 characters per token" ratio only holds for
ASCII. CJK, emoji, and other multibyte text run closer to one
token per codepoint in a real tokenizer, so a pure
chars / 4 estimate undercounts such input by up to ~4x — and
since this estimate is the sole basis for the hard-refuse
decision, that lets a caller feeding multibyte text reach ~4x
the real embedding volume before the cap trips. Take the larger
of the char-based and byte-based estimates so multibyte input
bills at least as much as its UTF-8 byte length implies.
163 164 165 166 167 168 |
# File 'lib/parse/embeddings/spend_cap.rb', line 163 def estimate_tokens(text) str = text.to_s chars = (str.length / 4.0).ceil bytes = (str.bytesize / 4.0).ceil [chars, bytes].max end |
.reset!(tenant_id = nil) ⇒ Object
Clear recorded usage (all tenants, or one). Limits are retained.
173 174 175 176 177 178 179 180 181 182 |
# File 'lib/parse/embeddings/spend_cap.rb', line 173 def reset!(tenant_id = nil) mutex.synchronize do if tenant_id.nil? @buckets = {} else buckets.delete(tenant_id) end end nil end |
.reset_all! ⇒ Object
Remove all configured limits AND recorded usage. Mainly for tests — returns the cap to its disabled-by-default state.
186 187 188 189 190 191 192 |
# File 'lib/parse/embeddings/spend_cap.rb', line 186 def reset_all! mutex.synchronize do @limits = {} @buckets = {} end nil end |
.usage(tenant_id: nil) ⇒ Integer
Current in-window token usage for a tenant (0 when uncapped or idle). Does not mutate.
140 141 142 143 144 145 146 147 |
# File 'lib/parse/embeddings/spend_cap.rb', line 140 def usage(tenant_id: nil) key = tenant_id.nil? ? DEFAULT_KEY : tenant_id mutex.synchronize do cfg = limit_for(key) return 0 if cfg.nil? prune(key, monotonic, cfg[:window]).sum { |e| e[1] } end end |