Class: Pikuri::Agent::ContextWindowDetector
- Inherits:
-
Object
- Object
- Pikuri::Agent::ContextWindowDetector
- Defined in:
- lib/pikuri/agent/context_window_detector.rb
Overview
Resolves the model’s context-window cap by asking the server that actually serves it. The only authoritative runtime source pikuri has is llama.cpp’s non-standard /props endpoint, which reports the server’s launched n_ctx (the real window — possibly smaller than the model’s theoretical max, e.g. llama-server -c 8192 on a 128k model). Returns nil — an honest “we don’t know” — for anything else.
Used by #detect_and_emit_context_cap! at construction and after every model switch to feed Listener::TokenLog a cap it can render alongside the running context size (so the ctx=12.2k/32.0k line tells the operator how close the conversation is to the limit). The caller prefers an explicit/inherited Pikuri::Agent::ChatTransport#context_window over this probe; this runs only when the transport carries none.
Why no ruby_llm registry source
RubyLLM::Model::Info#context_window is a static lookup in a bundled models.json snapshot: nil for every assume_exists local model id, nil for anything newer than the snapshot, and —worst — a frozen value for known models, so a window the provider later bumped (256k → 1M) still reports the old number. A cap you have to caveat defeats the cap’s only job (a number trustworthy enough to act on before RubyLLM::ContextLengthExceededError), so pikuri deliberately does not consult it. The probe (server truth) and an explicit Pikuri::Agent::ChatTransport#context_window (operator/parent truth) are the only two sources; absent both, the cap is nil.
The openai-provider gate + auto-derived URL
The probe only makes sense against an OpenAI-compatible local server (llama.cpp), reached through ruby_llm’s :openai provider with a custom base. So ContextWindowDetector.detect runs only when transport.provider == :openai and derives the probe URL from the same RubyLLM.config.openai_api_base the chat itself uses —/props lives at the host root, NOT under /v1, so the /v1 suffix is stripped. Deriving from the live config (rather than a URL passed in) means the probe can’t target a different server than the chat. A bare :openai pointed at real api.openai.com gets one fast /props 404 that degrades to nil (the simple gate; not worth narrowing — you’re already sending that server the whole conversation).
llama.cpp router mode
A llama.cpp router (the multi-instance front that proxies to N on-demand model servers) answers a bare /props with {“role”:“router”, …, “n_ctx”:0} — there is no single loaded model at the router itself, so its top-level n_ctx is 0. The real per-model cap is one proxied hop away: GET /props?model=<id> routes the probe to that model’s instance, whose /props carries the launched n_ctx. So when the bare probe reports role: router and a model_id is known, this re-probes with the model id before giving up. A plain single-model server is untouched: its bare /props already carries a positive n_ctx, so the router branch never runs.
Failure handling
The probe is best-effort. HTTP error, timeout, non-JSON body, or a missing/invalid n_ctx field all return nil and log one warn line via Pikuri.logger_for(‘ContextWindowDetector’). This is the CLAUDE.md “secondary to the loop” carve-out — a wedged or non-llama.cpp server should not abort agent construction over a cosmetic readout.
Constant Summary collapse
- LOGGER =
Subsystem logger; set its level with
PIKURI_LOG_CONTEXTWINDOWDETECTORor the globalPIKURI_LOG. Pikuri.logger_for('ContextWindowDetector')
- OPEN_TIMEOUT =
Connect timeout in seconds for the llama.cpp
/propsprobe. Short on purpose: a server that isn’t even listening should fail fast rather than stallAgentconstruction. 2- READ_TIMEOUT =
Read timeout in seconds for the llama.cpp
/propsprobe. Generous on purpose, and the reason it differs from OPEN_TIMEOUT: a llama.cpp router answers /props?model=<id> only after spinning up that model’s instance, and a cold model load can take 10+ seconds — which the next chat turn must wait for anyway. A read timeout shorter than the load would abandon the probe (and lose the cap) precisely when switching to a cold model. A server that accepts the connection but then hangs would stall the actual chat identically, so tolerating the wait here costs nothing extra. 30
Class Method Summary collapse
-
.detect(transport, openai_base: RubyLLM.config.openai_api_base) ⇒ Integer?
Resolve the context-window cap for
transportby probing the server that serves it. -
.props_url(openai_base) ⇒ String?
Derive the llama.cpp
/propsURL from the OpenAI-compatible base.
Instance Method Summary collapse
-
#initialize(probe_url:, model_id:) ⇒ ContextWindowDetector
constructor
A new instance of ContextWindowDetector.
-
#probe ⇒ Integer?
Resolved cap, or
nilif the probe produced none.
Constructor Details
#initialize(probe_url:, model_id:) ⇒ ContextWindowDetector
Returns a new instance of ContextWindowDetector.
143 144 145 146 |
# File 'lib/pikuri/agent/context_window_detector.rb', line 143 def initialize(probe_url:, model_id:) @probe_url = probe_url @model_id = model_id end |
Class Method Details
.detect(transport, openai_base: RubyLLM.config.openai_api_base) ⇒ Integer?
Resolve the context-window cap for transport by probing the server that serves it.
116 117 118 119 120 121 122 123 |
# File 'lib/pikuri/agent/context_window_detector.rb', line 116 def self.detect(transport, openai_base: RubyLLM.config.openai_api_base) return nil unless transport.provider == :openai url = props_url(openai_base) return nil if url.nil? new(probe_url: url, model_id: transport.model).probe end |
.props_url(openai_base) ⇒ String?
Derive the llama.cpp /props URL from the OpenAI-compatible base. /props sits at the host root, so a trailing /v1 is stripped before appending.
132 133 134 135 136 137 |
# File 'lib/pikuri/agent/context_window_detector.rb', line 132 def self.props_url(openai_base) base = openai_base.to_s.strip.chomp('/') return nil if base.empty? "#{base.delete_suffix('/v1')}/props" end |
Instance Method Details
#probe ⇒ Integer?
Returns resolved cap, or nil if the probe produced none.
150 151 152 |
# File 'lib/pikuri/agent/context_window_detector.rb', line 150 def probe probe_llama_cpp end |