Module: Zephira::Tokens
- Defined in:
- lib/zephira/tokens.rb
Overview
Approximates token counts for arbitrary text without pulling in a real tokenizer dependency. Counts word-like runs and standalone punctuation, which lands within ~20% of real BPE tokenizers (GPT/Claude) for English text — close enough for context-budget decisions, never trust for billing.
Constant Summary collapse
- TOKEN_PATTERN =
/\w+|[^\s\w]/.freeze
Class Method Summary collapse
Class Method Details
.estimate(text) ⇒ Object
11 12 13 14 |
# File 'lib/zephira/tokens.rb', line 11 def self.estimate(text) return 0 if text.nil? || text.empty? text.to_s.scan(TOKEN_PATTERN).size end |