Class: LlmDocsBuilder::TokenEstimator

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_docs_builder/token_estimator.rb

Overview

Estimates token count for text content using character-based approximation

Provides token estimation without requiring external tokenizer dependencies. Uses the common heuristic that ~4 characters equals 1 token for English text, which works reasonably well for documentation and markdown content.

Examples:

Basic usage

estimator = LlmDocsBuilder::TokenEstimator.new
token_count = estimator.estimate("This is a sample text.")

With custom characters per token

estimator = LlmDocsBuilder::TokenEstimator.new(chars_per_token: 3.5)
token_count = estimator.estimate(content)

Constant Summary collapse

DEFAULT_CHARS_PER_TOKEN =

Default number of characters per token

4.0

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ TokenEstimator

Initialize a new token estimator

Parameters:

  • chars_per_token (Float) (defaults to: DEFAULT_CHARS_PER_TOKEN)

    number of characters per token (default: 4.0)



29
30
31
# File 'lib/llm_docs_builder/token_estimator.rb', line 29

def initialize(chars_per_token: DEFAULT_CHARS_PER_TOKEN)
  @chars_per_token = chars_per_token.to_f
end

Instance Attribute Details

#chars_per_tokenFloat (readonly)

Returns characters per token ratio.

Returns:

  • (Float)

    characters per token ratio



24
25
26
# File 'lib/llm_docs_builder/token_estimator.rb', line 24

def chars_per_token
  @chars_per_token
end

Class Method Details

.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN) ⇒ Integer

Estimate token count (class method for convenience)

Parameters:

  • content (String)

    text content to estimate tokens for

  • chars_per_token (Float) (defaults to: DEFAULT_CHARS_PER_TOKEN)

    number of characters per token (default: 4.0)

Returns:

  • (Integer)

    estimated number of tokens



48
49
50
# File 'lib/llm_docs_builder/token_estimator.rb', line 48

def self.estimate(content, chars_per_token: DEFAULT_CHARS_PER_TOKEN)
  new(chars_per_token: chars_per_token).estimate(content)
end

Instance Method Details

#estimate(content) ⇒ Integer

Estimate token count for given content

Parameters:

  • content (String)

    text content to estimate tokens for

Returns:

  • (Integer)

    estimated number of tokens



37
38
39
40
41
# File 'lib/llm_docs_builder/token_estimator.rb', line 37

def estimate(content)
  return 0 if content.nil? || content.empty?

  (content.length / chars_per_token).round
end