Module: Ace::Review::Atoms::TokenEstimator
- Defined in:
- lib/ace/review/atoms/token_estimator.rb
Overview
Pure function for estimating token counts from text
Uses a simple chars/4 heuristic which provides reasonable accuracy (typically within 20% of actual token counts) for most text types. This is intentionally simple and fast - actual tokenization would require model-specific tokenizers which adds complexity and latency.
Constant Summary collapse
- CHARS_PER_TOKEN =
Average characters per token for the heuristic Most modern tokenizers average around 3-5 chars/token 4 is a reasonable middle ground
4
Class Method Summary collapse
-
.estimate(text) ⇒ Integer
Estimate token count from a string using chars/4 heuristic.
-
.estimate_file(path) ⇒ Integer
Estimate token count from a file.
-
.estimate_files(paths) ⇒ Integer
Estimate token count from multiple files.
-
.estimate_many(texts) ⇒ Integer
Estimate token count from multiple strings.
Class Method Details
.estimate(text) ⇒ Integer
Estimate token count from a string using chars/4 heuristic
34 35 36 37 38 |
# File 'lib/ace/review/atoms/token_estimator.rb', line 34 def self.estimate(text) return 0 if text.nil? || text.empty? (text.length.to_f / CHARS_PER_TOKEN).ceil end |
.estimate_file(path) ⇒ Integer
Estimate token count from a file
50 51 52 53 |
# File 'lib/ace/review/atoms/token_estimator.rb', line 50 def self.estimate_file(path) content = File.read(path) estimate(content) end |
.estimate_files(paths) ⇒ Integer
Estimate token count from multiple files
78 79 80 81 82 |
# File 'lib/ace/review/atoms/token_estimator.rb', line 78 def self.estimate_files(paths) return 0 if paths.nil? || paths.empty? paths.sum { |path| estimate_file(path) } end |
.estimate_many(texts) ⇒ Integer
Estimate token count from multiple strings
63 64 65 66 67 |
# File 'lib/ace/review/atoms/token_estimator.rb', line 63 def self.estimate_many(texts) return 0 if texts.nil? || texts.empty? texts.sum { |text| estimate(text) } end |