Class: RubyLLM::Tokenizer::Backend::Approximate

Inherits:
Tiktoken
  • Object
show all
Defined in:
lib/ruby_llm/tokenizer/backend/approximate.rb

Overview

Approximate tokenizer for models with no published tokenizer (notably Anthropic Claude). Wraps a tiktoken encoding as a stand-in. Token counts are typically within ~5-15% of the model’s true count and should not be used for hard limits.

Instance Attribute Summary

Attributes inherited from Tiktoken

#encoding_name

Instance Method Summary collapse

Methods inherited from Tiktoken

#decode

Methods inherited from Base

#analyze, #count, #decode, #truncate

Constructor Details

#initialize(encoding: "o200k_base") ⇒ Approximate

Returns a new instance of Approximate.



13
14
15
16
17
# File 'lib/ruby_llm/tokenizer/backend/approximate.rb', line 13

def initialize(encoding: "o200k_base")
  super
  @warned = false
  @warn_mutex = Mutex.new
end

Instance Method Details

#encode(text) ⇒ Object



19
20
21
22
# File 'lib/ruby_llm/tokenizer/backend/approximate.rb', line 19

def encode(text)
  warn_once
  super
end

#identifierObject



24
25
26
# File 'lib/ruby_llm/tokenizer/backend/approximate.rb', line 24

def identifier
  "approximate:#{encoding_name}"
end