Module: Zephira::Tokens

Defined in:
lib/zephira/tokens.rb

Overview

Approximates token counts for arbitrary text without pulling in a real tokenizer dependency. Counts word-like runs and standalone punctuation, which lands within ~20% of real BPE tokenizers (GPT/Claude) for English text — close enough for context-budget decisions, never trust for billing.

Constant Summary collapse

TOKEN_PATTERN =
/\w+|[^\s\w]/.freeze

Class Method Summary collapse

Class Method Details

.estimate(text) ⇒ Object



11
12
13
14
# File 'lib/zephira/tokens.rb', line 11

def self.estimate(text)
  return 0 if text.nil? || text.empty?
  text.to_s.scan(TOKEN_PATTERN).size
end