Class: OllamaAgent::Providers::QuotaTracker

Inherits:
Object
  • Object
show all
Defined in:
lib/ollama_agent/providers/quota_tracker.rb

Overview

Per-credential usage accounting against declared provider limits.

Tracks daily token/request consumption and live RPM/TPM windows. Since most providers don’t expose a real-time quota API, usage is estimated locally from response metadata (prompt_tokens + completion_tokens returned by each successful call).

Daily counters auto-reset at midnight UTC.

Examples:

tracker = QuotaTracker.new(limits: { rpm: 60, tpm: 90_000, daily_tokens: 10_000_000 })
tracker.record({ prompt_tokens: 500, completion_tokens: 200, total_tokens: 700 })
tracker.exhausted?          # => false
tracker.near_exhaustion?    # => false
tracker.summary             # => { daily_tokens: 700, rpm: 1, tpm: 700, ... }

Constant Summary collapse

NEAR_EXHAUSTION_PCT =
0.90

Instance Method Summary collapse

Constructor Details

#initialize(limits: {}) ⇒ QuotaTracker

Returns a new instance of QuotaTracker.

Parameters:

  • limits (Hash) (defaults to: {})

    declared provider limits: :rpm Integer max requests per minute :tpm Integer max tokens per minute :daily_tokens Integer max tokens per day :daily_requests Integer max requests per day



30
31
32
33
34
35
36
37
38
# File 'lib/ollama_agent/providers/quota_tracker.rb', line 30

def initialize(limits: {})
  @limits          = limits.transform_keys(&:to_sym)
  @daily_tokens    = 0
  @daily_requests  = 0
  @daily_reset_at  = next_midnight
  @rpm_window      = RateWindow.new(window_seconds: 60)
  @tpm_window      = RateWindow.new(window_seconds: 60)
  @mutex           = Mutex.new
end

Instance Method Details

#daily_utilisationFloat

Quota utilisation as a fraction [0.0, 1.0] of the daily token limit. Returns 0.0 if no daily token limit is configured.

Returns:

  • (Float)


80
81
82
83
84
85
# File 'lib/ollama_agent/providers/quota_tracker.rb', line 80

def daily_utilisation
  @mutex.synchronize do
    maybe_reset_daily!
    daily_pct
  end
end

#exhausted?Boolean

True when daily hard limits are hit (requests will be rejected by the provider).

Returns:

  • (Boolean)


60
61
62
63
64
65
# File 'lib/ollama_agent/providers/quota_tracker.rb', line 60

def exhausted?
  @mutex.synchronize do
    maybe_reset_daily!
    daily_token_limit_hit? || daily_request_limit_hit?
  end
end

#near_exhaustion?Boolean

True when usage is approaching exhaustion (>= 90% of daily token limit). Used for predictive rerouting before a hard failure occurs.

Returns:

  • (Boolean)


70
71
72
73
74
75
# File 'lib/ollama_agent/providers/quota_tracker.rb', line 70

def near_exhaustion?
  @mutex.synchronize do
    maybe_reset_daily!
    daily_pct >= NEAR_EXHAUSTION_PCT
  end
end

#record(usage) ⇒ Object

Record usage from a successful response.

Parameters:

  • usage (Hash, nil)

    { prompt_tokens:, completion_tokens:, total_tokens: }



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/ollama_agent/providers/quota_tracker.rb', line 42

def record(usage)
  return unless usage

  tokens = (usage[:total_tokens] || usage["total_tokens"] || 0).to_i

  @mutex.synchronize do
    maybe_reset_daily!
    @daily_tokens   += tokens
    @daily_requests += 1
  end

  # Rate windows have their own mutex
  @rpm_window.record(1)
  @tpm_window.record(tokens)
end

#summaryHash

Full usage snapshot for TUI and telemetry.

Returns:

  • (Hash)


89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/ollama_agent/providers/quota_tracker.rb', line 89

def summary
  @mutex.synchronize do
    maybe_reset_daily!
    {
      daily_tokens: @daily_tokens,
      daily_tokens_limit: @limits[:daily_tokens],
      daily_requests: @daily_requests,
      daily_requests_limit: @limits[:daily_requests],
      rpm: @rpm_window.current_rate,
      rpm_limit: @limits[:rpm],
      tpm: @tpm_window.current_rate,
      tpm_limit: @limits[:tpm],
      daily_pct: daily_pct,
      resets_at: @daily_reset_at
    }
  end
end