Module: TokenEstimation

Extended by:
ActiveSupport::Concern
Included in:
Message, PinnedMessage, Snapshot
Defined in:
app/models/concerns/token_estimation.rb

Overview

Shared token-count lifecycle for records that ride in the LLM context window. Including models seed #token_count with a local heuristic on create and schedule CountTokensJob to refine it with the real Anthropic tokenizer count.

Non-AR callers (TUI debug display, phantom-pair sizing, byte-cap calculations) use TokenEstimation.estimate_token_count and BYTES_PER_TOKEN as module-level helpers without including the concern.

Including models must implement #tokenization_text returning the string whose token count should be estimated and later refined.

Constant Summary collapse

BYTES_PER_TOKEN =

Heuristic: average bytes per token for English prose.

4

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.estimate_token_count(text) ⇒ Integer

Estimates token count from a string using the BYTES_PER_TOKEN heuristic.

Parameters:

  • text (String, nil)

Returns:

  • (Integer)

    estimated token count (0 for blank input)



24
25
26
# File 'app/models/concerns/token_estimation.rb', line 24

def self.estimate_token_count(text)
  (text.to_s.bytesize / BYTES_PER_TOKEN.to_f).ceil
end

Instance Method Details

#estimate_tokensInteger

Heuristic token estimate for this record’s #tokenization_text.

Returns:

  • (Integer)


36
37
38
# File 'app/models/concerns/token_estimation.rb', line 36

def estimate_tokens
  TokenEstimation.estimate_token_count(tokenization_text)
end