Module: Ace::Review::Atoms::TokenEstimator

Defined in:: lib/ace/review/atoms/token_estimator.rb

Overview

Pure function for estimating token counts from text

Uses a simple chars/4 heuristic which provides reasonable accuracy (typically within 20% of actual token counts) for most text types. This is intentionally simple and fast - actual tokenization would require model-specific tokenizers which adds complexity and latency.

Examples:

Basic usage

TokenEstimator.estimate("hello world")
#=> 2

File estimation

TokenEstimator.estimate_file("/path/to/code.rb")
#=> 1500

Constant Summary collapse

CHARS_PER_TOKEN = Average characters per token for the heuristic Most modern tokenizers average around 3-5 chars/token 4 is a reasonable middle ground

Class Method Summary collapse

.estimate(text) ⇒ Integer

Estimate token count from a string using chars/4 heuristic.
.estimate_file(path) ⇒ Integer

Estimate token count from a file.
.estimate_files(paths) ⇒ Integer

Estimate token count from multiple files.
.estimate_many(texts) ⇒ Integer

Estimate token count from multiple strings.

Class Method Details

.estimate(text) ⇒ `Integer`

Estimate token count from a string using chars/4 heuristic

Examples:

TokenEstimator.estimate("Hello, world!")
#=> 3

Parameters:

text (String, nil) —

The text to estimate tokens for

Returns:

(Integer) —

Estimated token count (0 for nil/empty)

# File 'lib/ace/review/atoms/token_estimator.rb', line 34

def self.estimate(text)
  return 0 if text.nil? || text.empty?

  (text.length.to_f / CHARS_PER_TOKEN).ceil
end

.estimate_file(path) ⇒ `Integer`

Estimate token count from a file

Examples:

TokenEstimator.estimate_file("/path/to/code.rb")
#=> 1500

Parameters:

path (String) —

Path to the file

Returns:

(Integer) —

Estimated token count

Raises:

(Errno::ENOENT) —

if file does not exist
(Errno::EACCES) —

if file is not readable

# File 'lib/ace/review/atoms/token_estimator.rb', line 50

def self.estimate_file(path)
  content = File.read(path)
  estimate(content)
end

.estimate_files(paths) ⇒ `Integer`

Estimate token count from multiple files

Examples:

TokenEstimator.estimate_files(["/path/to/a.rb", "/path/to/b.rb"])
#=> 3500

Parameters:

paths (Array<String>) —

Array of file paths

Returns:

(Integer) —

Total estimated token count

Raises:

(Errno::ENOENT) —

if any file does not exist

# File 'lib/ace/review/atoms/token_estimator.rb', line 78

def self.estimate_files(paths)
  return 0 if paths.nil? || paths.empty?

  paths.sum { |path| estimate_file(path) }
end

.estimate_many(texts) ⇒ `Integer`

Estimate token count from multiple strings

Examples:

TokenEstimator.estimate_many(["hello", "world"])
#=> 2

Parameters:

texts (Array<String>) —

Array of text strings

Returns:

(Integer) —

Total estimated token count

# File 'lib/ace/review/atoms/token_estimator.rb', line 63

def self.estimate_many(texts)
  return 0 if texts.nil? || texts.empty?

  texts.sum { |text| estimate(text) }
end

Module: Ace::Review::Atoms::TokenEstimator

Overview

Examples:

Basic usage

File estimation

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.estimate(text) ⇒ Integer

Examples:

.estimate_file(path) ⇒ Integer

Examples:

.estimate_files(paths) ⇒ Integer

Examples:

.estimate_many(texts) ⇒ Integer

Examples:

.estimate(text) ⇒ `Integer`

.estimate_file(path) ⇒ `Integer`

.estimate_files(paths) ⇒ `Integer`

.estimate_many(texts) ⇒ `Integer`