Class: LlmDocsBuilder::TokenEstimator

Inherits:

Object

Object
LlmDocsBuilder::TokenEstimator

show all

Defined in:: lib/llm_docs_builder/token_estimator.rb

Overview

Estimates token count for text content using character-based approximation

Provides token estimation without requiring external tokenizer dependencies. Uses the common heuristic that ~4 characters equals 1 token for English text, which works reasonably well for documentation and markdown content.

Examples:

Basic usage

estimator = LlmDocsBuilder::TokenEstimator.new
token_count = estimator.estimate("This is a sample text.")

With custom characters per token

estimator = LlmDocsBuilder::TokenEstimator.new(chars_per_token: 3.5)
token_count = estimator.estimate(content)

Constant Summary collapse

DEFAULT_CHARS_PER_TOKEN = Default number of characters per token

4.0

Instance Attribute Summary collapse

#chars_per_token ⇒ Float readonly

Characters per token ratio.

Class Method Summary collapse

.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ Integer

Estimate token count (class method for convenience).

Instance Method Summary collapse

#estimate(content) ⇒ Integer

Estimate token count for given content.
#initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ TokenEstimator constructor

Initialize a new token estimator.

Constructor Details

#initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ `TokenEstimator`

Initialize a new token estimator

Parameters:

chars_per_token (Float) (defaults to: DEFAULT_CHARS_PER_TOKEN) —

number of characters per token (default: 4.0)



29
30
31

# File 'lib/llm_docs_builder/token_estimator.rb', line 29

def initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN)
  @chars_per_token = chars_per_token.to_f
end

Instance Attribute Details

#chars_per_token ⇒ `Float` (readonly)

Returns characters per token ratio.

Returns:

(Float) —

characters per token ratio



24
25
26

# File 'lib/llm_docs_builder/token_estimator.rb', line 24

def chars_per_token
  @chars_per_token
end

Class Method Details

.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ `Integer`

Estimate token count (class method for convenience)

Parameters:

content (String) —

text content to estimate tokens for
chars_per_token (Float) (defaults to: DEFAULT_CHARS_PER_TOKEN) —

number of characters per token (default: 4.0)

Returns:

(Integer) —

estimated number of tokens



48
49
50

# File 'lib/llm_docs_builder/token_estimator.rb', line 48

def self.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN)
  new(chars_per_token: chars_per_token).estimate(content)
end

Instance Method Details

#estimate(content) ⇒ `Integer`

Estimate token count for given content

Parameters:

content (String) —

text content to estimate tokens for

Returns:

(Integer) —

estimated number of tokens

# File 'lib/llm_docs_builder/token_estimator.rb', line 37

def estimate(content)
  return 0 if content.nil? || content.empty?

  (content.length / chars_per_token).round
end

Class: LlmDocsBuilder::TokenEstimator

Overview

Examples:

Basic usage

With custom characters per token

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ TokenEstimator

Instance Attribute Details

#chars_per_token ⇒ Float (readonly)

Class Method Details

.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ Integer

Instance Method Details

#estimate(content) ⇒ Integer

#initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ `TokenEstimator`

#chars_per_token ⇒ `Float` (readonly)

.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ `Integer`

#estimate(content) ⇒ `Integer`