Module: Octo::Agent::LlmCaller
- Included in:
- Octo::Agent
- Defined in:
- lib/octo/agent/llm_caller.rb
Overview
LLM API call management Handles API calls with retry logic, fallback model support, and progress indication
Constant Summary collapse
- RETRIES_BEFORE_FALLBACK =
Number of consecutive RetryableError failures (503/429/5xx) before switching to fallback. Network-level errors (connection failures, timeouts) do NOT trigger fallback — they are retried on the primary model for the full max_retries budget, since they are likely transient infrastructure blips rather than a model-level outage.
3- MAX_RETRIES_ON_FALLBACK =
After switching to the fallback model, allow this many retries before giving up. Kept lower than max_retries (10) because we have already exhausted the primary model.
5
Instance Method Summary collapse
-
#collect_iteration_tokens(usage) ⇒ Hash
Collect token usage data for current iteration and return it.
Instance Method Details
#collect_iteration_tokens(usage) ⇒ Hash
Collect token usage data for current iteration and return it. Does NOT calculate cost — cost tracking has been removed.
774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 |
# File 'lib/octo/agent/llm_caller.rb', line 774 def collect_iteration_tokens(usage) prompt_tokens = usage[:prompt_tokens] || 0 completion_tokens = usage[:completion_tokens] || 0 total_tokens = usage[:total_tokens] || (prompt_tokens + completion_tokens) cache_write = usage[:cache_creation_input_tokens] || 0 cache_read = usage[:cache_read_input_tokens] || 0 delta_tokens = if usage[:total_is_per_turn] total_tokens else total_tokens - @previous_total_tokens end @previous_total_tokens = total_tokens { delta_tokens: delta_tokens, prompt_tokens: prompt_tokens, completion_tokens: completion_tokens, total_tokens: total_tokens, cache_write: cache_write, cache_read: cache_read } end |