Module: Parse::Embeddings::SpendCap
- Defined in:
- lib/parse/embeddings/spend_cap.rb
Overview
Per-tenant cumulative embedding spend cap.
The agent semantic_search tool embeds attacker-controlled text
(chat queries) on every call. Without a cap, a tenant — or an
adversary driving an agent — can run up unbounded embedding-provider
cost. SpendCap tracks the cumulative number of tokens embedded
per tenant inside a sliding time window and HARD-REFUSES (raises
Exceeded) once a tenant would exceed its limit. This is distinct
from Agent::RateLimiter, which bounds request count per
window; the spend cap bounds embedding volume (a proxy for cost).
== Disabled by default
With no configured limit the cap is a no-op — SpendCap.charge! records nothing and never raises. Operators opt in:
Parse::Embeddings::SpendCap.configure(limit_tokens: 1_000_000, window: 3600) Parse::Embeddings::SpendCap.configure(:acme_tenant, limit_tokens: 50_000)
A per-tenant limit (second form) overrides the default for that tenant. The reserved key DEFAULT_KEY sets the fallback applied to every tenant without an explicit limit.
== Token estimation
Callers pass an explicit token count, or use SpendCap.estimate_tokens (a chars/4 heuristic — the same approximation the agent layer uses for its context-token budgets). The cap is intentionally an estimate: it exists to bound runaway cost, not to bill precisely.
Thread-safe: all state lives behind a single mutex.
Defined Under Namespace
Classes: Exceeded
Constant Summary collapse
- DEFAULT_KEY =
Fallback bucket key for charges with no tenant identity, and the key under which configure (with no explicit tenant) sets the default limit applied to every tenant lacking an override.
:__default__- PRECHARGED_KEY =
Thread-local key marking that the current call stack has already charged the spend cap (or deliberately exempted itself). Set by with_precharged; read by charge_query! so the inner query-embed paths (
find_similar(text:),hybrid_search,Parse::Retrieval.retrieve) don't double-bill a query the agent tool already charged with proper tenant identity. :parse_embed_spend_precharged- DEFAULT_WINDOW =
Default sliding window (seconds) when none is configured.
3600- AS_NOTIFICATION_NAME =
AS::N event emitted when a tenant's in-window usage crosses the configured
warn_at:fraction of its hard limit. Payload:{ tenant_id:, used:, limit:, window:, warn_at:, threshold: }. Emitted once per window-crossing (re-arms as usage rolls off), never on the hard-refuse itself (that raises Exceeded). "parse.embeddings.spend_cap_warning"
Class Method Summary collapse
-
.charge!(tenant_id:, tokens:) ⇒ Integer?
Charge
tokensagainsttenant_id's budget. -
.charge_query!(text, tenant_id: nil) ⇒ Integer?
Charge a query-embed against the cap from a non-agent path.
-
.configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW, warn_at: nil)
Configure the cap.
-
.estimate_tokens(text) ⇒ Integer
Estimate token count from a String.
-
.precharged? ⇒ Boolean
Whether the current call stack is inside SpendCap.with_precharged.
-
.reset!(tenant_id = nil) ⇒ Object
Clear recorded usage (all tenants, or one).
-
.reset_all! ⇒ Object
Remove all configured limits AND recorded usage.
-
.usage(tenant_id: nil) ⇒ Integer
Current in-window token usage for a tenant (0 when uncapped or idle).
-
.with_precharged ⇒ Object
Run a block with the inner query-embed charge suppressed.
Class Method Details
.charge!(tenant_id:, tokens:) ⇒ Integer?
Charge tokens against tenant_id's budget. HARD-REFUSES by
raising Exceeded when the charge would push the tenant over
its limit within the window; otherwise records the charge and
returns the new in-window total.
No-op (returns nil) when no limit applies to the tenant.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
# File 'lib/parse/embeddings/spend_cap.rb', line 144 def charge!(tenant_id:, tokens:) t = Integer(tokens) raise ArgumentError, "SpendCap: tokens must be >= 0 (got #{t})." if t.negative? key = tenant_id.nil? ? DEFAULT_KEY : tenant_id warn_payload = nil total = mutex.synchronize do cfg = limit_for(key) return nil if cfg.nil? # uncapped window = cfg[:window] limit = cfg[:limit] now = monotonic entries = prune(key, now, window) used = entries.sum { |e| e[1] } if used + t > limit raise Exceeded.new( tenant_id: key, limit: limit, used: used, requested: t, window: window, retry_after: retry_after_for(entries, t, limit, window, now), ) end entries << [now, t] if t.positive? # Soft-cap crossing: fire only when THIS charge moves usage # from below the threshold to at-or-above it, so a tenant # hovering over the line doesn't spam an event per charge. # Pruned entries re-arm the warning naturally as the window # rolls off. if (wa = cfg[:warn_at]) threshold = limit * wa if used < threshold && used + t >= threshold warn_payload = { tenant_id: key, used: used + t, limit: limit, window: window, warn_at: wa, threshold: threshold, } end end used + t end emit_soft_cap_warning(warn_payload) if warn_payload total end |
.charge_query!(text, tenant_id: nil) ⇒ Integer?
Charge a query-embed against the cap from a non-agent path.
This is the v5.5 closure of "spend-cap coverage on all embed
paths": find_similar(text:), hybrid_search(text:), and
Parse::Retrieval.retrieve route their query text through
here before embedding.
- No-op inside with_precharged (the agent tool charged already, with per-tenant identity).
- Tenant identity falls back to the ambient cache-tenant scope (Parse.with_cache_tenant) when set, else the shared DEFAULT_KEY bucket.
- No-op (like charge!) when no limit is configured.
243 244 245 246 247 248 249 |
# File 'lib/parse/embeddings/spend_cap.rb', line 243 def charge_query!(text, tenant_id: nil) return nil if precharged? if tenant_id.nil? && defined?(Parse) && Parse.respond_to?(:current_cache_tenant) tenant_id = Parse.current_cache_tenant end charge!(tenant_id: tenant_id, tokens: estimate_tokens(text)) end |
.configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW, warn_at: nil)
This method returns an undefined value.
Configure the cap. Two forms:
configure(limit_tokens:, window:) # default for all tenants configure(tenant_id, limit_tokens:, window:) # override one tenant
limit_tokens: nil disables the cap for that scope (the default
scope when no tenant is given).
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
# File 'lib/parse/embeddings/spend_cap.rb', line 106 def configure(tenant_id = nil, limit_tokens:, window: DEFAULT_WINDOW, warn_at: nil) key = tenant_id.nil? ? DEFAULT_KEY : tenant_id unless limit_tokens.nil? li = Integer(limit_tokens) raise ArgumentError, "SpendCap: limit_tokens must be positive (got #{li})." if li <= 0 end w = Integer(window) raise ArgumentError, "SpendCap: window must be positive (got #{w})." if w <= 0 unless warn_at.nil? wa = Float(warn_at) unless wa > 0.0 && wa < 1.0 raise ArgumentError, "SpendCap: warn_at must be between 0 and 1 exclusive (got #{warn_at})." end end mutex.synchronize do limits[key] = if limit_tokens.nil? nil else cfg = { limit: Integer(limit_tokens), window: w } cfg[:warn_at] = Float(warn_at) unless warn_at.nil? cfg end end nil end |
.estimate_tokens(text) ⇒ Integer
Estimate token count from a String.
The familiar "~4 characters per token" ratio only holds for
ASCII. CJK, emoji, and other multibyte text run closer to one
token per codepoint in a real tokenizer, so a pure
chars / 4 estimate undercounts such input by up to ~4x — and
since this estimate is the sole basis for the hard-refuse
decision, that lets a caller feeding multibyte text reach ~4x
the real embedding volume before the cap trips. Take the larger
of the char-based and byte-based estimates so multibyte input
bills at least as much as its UTF-8 byte length implies.
265 266 267 268 269 270 |
# File 'lib/parse/embeddings/spend_cap.rb', line 265 def estimate_tokens(text) str = text.to_s chars = (str.length / 4.0).ceil bytes = (str.bytesize / 4.0).ceil [chars, bytes].max end |
.precharged? ⇒ Boolean
Returns whether the current call stack is inside with_precharged.
220 221 222 |
# File 'lib/parse/embeddings/spend_cap.rb', line 220 def precharged? !!Thread.current[PRECHARGED_KEY] end |
.reset!(tenant_id = nil) ⇒ Object
Clear recorded usage (all tenants, or one). Limits are retained.
275 276 277 278 279 280 281 282 283 284 |
# File 'lib/parse/embeddings/spend_cap.rb', line 275 def reset!(tenant_id = nil) mutex.synchronize do if tenant_id.nil? @buckets = {} else buckets.delete(tenant_id) end end nil end |
.reset_all! ⇒ Object
Remove all configured limits AND recorded usage. Mainly for tests — returns the cap to its disabled-by-default state.
288 289 290 291 292 293 294 |
# File 'lib/parse/embeddings/spend_cap.rb', line 288 def reset_all! mutex.synchronize do @limits = {} @buckets = {} end nil end |
.usage(tenant_id: nil) ⇒ Integer
Current in-window token usage for a tenant (0 when uncapped or idle). Does not mutate.
192 193 194 195 196 197 198 199 |
# File 'lib/parse/embeddings/spend_cap.rb', line 192 def usage(tenant_id: nil) key = tenant_id.nil? ? DEFAULT_KEY : tenant_id mutex.synchronize do cfg = limit_for(key) return 0 if cfg.nil? prune(key, monotonic, cfg[:window]).sum { |e| e[1] } end end |
.with_precharged ⇒ Object
Run a block with the inner query-embed charge suppressed.
Callers that have ALREADY charged the cap with better tenant
identity (the semantic_search agent tool charges per-tenant
before calling retrieve) — or that deliberately exempt the
call (trusted admin agents) — wrap their downstream embed in
this so charge_query! inside find_similar / retrieve
is a no-op. Restores the prior flag on exit (nesting-safe).
210 211 212 213 214 215 216 |
# File 'lib/parse/embeddings/spend_cap.rb', line 210 def with_precharged prev = Thread.current[PRECHARGED_KEY] Thread.current[PRECHARGED_KEY] = true yield ensure Thread.current[PRECHARGED_KEY] = prev end |