Module: Octo::Agent::LlmCaller

Included in:
Octo::Agent
Defined in:
lib/octo/agent/llm_caller.rb

Overview

LLM API call management Handles API calls with retry logic, fallback model support, and progress indication

Constant Summary collapse

RETRIES_BEFORE_FALLBACK =

Number of consecutive RetryableError failures (503/429/5xx) before switching to fallback. Network-level errors (connection failures, timeouts) do NOT trigger fallback — they are retried on the primary model for the full max_retries budget, since they are likely transient infrastructure blips rather than a model-level outage.

3
MAX_RETRIES_ON_FALLBACK =

After switching to the fallback model, allow this many retries before giving up. Kept lower than max_retries (10) because we have already exhausted the primary model.

5

Instance Method Summary collapse

Instance Method Details

#collect_iteration_tokens(usage) ⇒ Hash

Collect token usage data for current iteration and return it. Does NOT calculate cost — cost tracking has been removed.

Parameters:

  • usage (Hash)

    Usage data from API

Returns:

  • (Hash)

    token_data ready for show_token_usage



774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
# File 'lib/octo/agent/llm_caller.rb', line 774

def collect_iteration_tokens(usage)
  prompt_tokens = usage[:prompt_tokens] || 0
  completion_tokens = usage[:completion_tokens] || 0
  total_tokens = usage[:total_tokens] || (prompt_tokens + completion_tokens)
  cache_write = usage[:cache_creation_input_tokens] || 0
  cache_read = usage[:cache_read_input_tokens] || 0

  delta_tokens =
    if usage[:total_is_per_turn]
      total_tokens
    else
      total_tokens - @previous_total_tokens
    end
  @previous_total_tokens = total_tokens

  {
    delta_tokens: delta_tokens,
    prompt_tokens: prompt_tokens,
    completion_tokens: completion_tokens,
    total_tokens: total_tokens,
    cache_write: cache_write,
    cache_read: cache_read
  }
end