Module: Phronomy::Context::TokenEstimator
- Defined in:
- lib/phronomy/context/token_estimator.rb
Overview
Central, stateless token estimation utility.
All token counting in the framework passes through this module so that the approximation logic lives in one place and can be upgraded without touching any other class.
Default approximation: ceil(char_count / 4). English text averages ~4 chars/token; Japanese text averages ~2 chars/token so this is a slight underestimate for Japanese.
Replace the built-in heuristic with any callable via .tokenizer=:
Class Method Summary collapse
-
.estimate(input) ⇒ Integer
Estimate the number of tokens for the given input.
-
.reset_tokenizer! ⇒ Object
Resets the tokenizer to the built-in heuristic.
- .tokenizer ⇒ #call?
-
.tokenizer=(callable) ⇒ Object
Replace the built-in heuristic with a callable that takes a String and returns an Integer token count.
Class Method Details
.estimate(input) ⇒ Integer
Estimate the number of tokens for the given input.
55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/phronomy/context/token_estimator.rb', line 55 def estimate(input) tok = @tokenizer_mutex.synchronize { @tokenizer } case input when String tok ? tok.call(input) : (input.length / 4.0).ceil when Array input.sum { |m| estimate(m.content.to_s) } else estimate(input.content.to_s) end end |
.reset_tokenizer! ⇒ Object
Resets the tokenizer to the built-in heuristic. Intended for test isolation.
46 47 48 |
# File 'lib/phronomy/context/token_estimator.rb', line 46 def reset_tokenizer! @tokenizer_mutex.synchronize { @tokenizer = nil } end |
.tokenizer ⇒ #call?
41 42 43 |
# File 'lib/phronomy/context/token_estimator.rb', line 41 def tokenizer @tokenizer_mutex.synchronize { @tokenizer } end |
.tokenizer=(callable) ⇒ Object
This is a process-wide setting. Set it once at application startup. In tests, call +TokenEstimator.reset_tokenizer!+ after each test to prevent cross-test contamination.
Replace the built-in heuristic with a callable that takes a String and returns an Integer token count. Set to nil to restore the default.
36 37 38 |
# File 'lib/phronomy/context/token_estimator.rb', line 36 def tokenizer=(callable) @tokenizer_mutex.synchronize { @tokenizer = callable } end |