Module: TokenEstimation
- Extended by:
- ActiveSupport::Concern
- Included in:
- Message, PinnedMessage, Snapshot
- Defined in:
- app/models/concerns/token_estimation.rb
Overview
Shared token-count lifecycle for records that ride in the LLM context window. Including models seed #token_count with a local heuristic on create and schedule CountTokensJob to refine it with the real Anthropic tokenizer count.
Non-AR callers (TUI debug display, phantom-pair sizing, byte-cap calculations) use TokenEstimation.estimate_token_count and BYTES_PER_TOKEN as module-level helpers without including the concern.
Including models must implement #tokenization_text returning the string whose token count should be estimated and later refined.
Constant Summary collapse
- BYTES_PER_TOKEN =
Heuristic: average bytes per token for English prose.
4
Class Method Summary collapse
-
.estimate_token_count(text) ⇒ Integer
Estimates token count from a string using the BYTES_PER_TOKEN heuristic.
Instance Method Summary collapse
-
#estimate_tokens ⇒ Integer
Heuristic token estimate for this record’s #tokenization_text.
Class Method Details
.estimate_token_count(text) ⇒ Integer
Estimates token count from a string using the BYTES_PER_TOKEN heuristic.
24 25 26 |
# File 'app/models/concerns/token_estimation.rb', line 24 def self.estimate_token_count(text) (text.to_s.bytesize / BYTES_PER_TOKEN.to_f).ceil end |
Instance Method Details
#estimate_tokens ⇒ Integer
Heuristic token estimate for this record’s #tokenization_text.
36 37 38 |
# File 'app/models/concerns/token_estimation.rb', line 36 def estimate_tokens TokenEstimation.estimate_token_count(tokenization_text) end |