Module: Ace::Review::Atoms::TokenEstimator

Defined in:
lib/ace/review/atoms/token_estimator.rb

Overview

Pure function for estimating token counts from text

Uses a simple chars/4 heuristic which provides reasonable accuracy (typically within 20% of actual token counts) for most text types. This is intentionally simple and fast - actual tokenization would require model-specific tokenizers which adds complexity and latency.

Examples:

Basic usage

TokenEstimator.estimate("hello world")
#=> 2

File estimation

TokenEstimator.estimate_file("/path/to/code.rb")
#=> 1500

Constant Summary collapse

CHARS_PER_TOKEN =

Average characters per token for the heuristic Most modern tokenizers average around 3-5 chars/token 4 is a reasonable middle ground

4

Class Method Summary collapse

Class Method Details

.estimate(text) ⇒ Integer

Estimate token count from a string using chars/4 heuristic

Examples:

TokenEstimator.estimate("Hello, world!")
#=> 3

Parameters:

  • text (String, nil)

    The text to estimate tokens for

Returns:

  • (Integer)

    Estimated token count (0 for nil/empty)



34
35
36
37
38
# File 'lib/ace/review/atoms/token_estimator.rb', line 34

def self.estimate(text)
  return 0 if text.nil? || text.empty?

  (text.length.to_f / CHARS_PER_TOKEN).ceil
end

.estimate_file(path) ⇒ Integer

Estimate token count from a file

Examples:

TokenEstimator.estimate_file("/path/to/code.rb")
#=> 1500

Parameters:

  • path (String)

    Path to the file

Returns:

  • (Integer)

    Estimated token count

Raises:

  • (Errno::ENOENT)

    if file does not exist

  • (Errno::EACCES)

    if file is not readable



50
51
52
53
# File 'lib/ace/review/atoms/token_estimator.rb', line 50

def self.estimate_file(path)
  content = File.read(path)
  estimate(content)
end

.estimate_files(paths) ⇒ Integer

Estimate token count from multiple files

Examples:

TokenEstimator.estimate_files(["/path/to/a.rb", "/path/to/b.rb"])
#=> 3500

Parameters:

  • paths (Array<String>)

    Array of file paths

Returns:

  • (Integer)

    Total estimated token count

Raises:

  • (Errno::ENOENT)

    if any file does not exist



78
79
80
81
82
# File 'lib/ace/review/atoms/token_estimator.rb', line 78

def self.estimate_files(paths)
  return 0 if paths.nil? || paths.empty?

  paths.sum { |path| estimate_file(path) }
end

.estimate_many(texts) ⇒ Integer

Estimate token count from multiple strings

Examples:

TokenEstimator.estimate_many(["hello", "world"])
#=> 2

Parameters:

  • texts (Array<String>)

    Array of text strings

Returns:

  • (Integer)

    Total estimated token count



63
64
65
66
67
# File 'lib/ace/review/atoms/token_estimator.rb', line 63

def self.estimate_many(texts)
  return 0 if texts.nil? || texts.empty?

  texts.sum { |text| estimate(text) }
end