Class: RubyLLM::Tokenizer::Backend::Approximate

Inherits:

Tiktoken

Object
Base
Tiktoken
RubyLLM::Tokenizer::Backend::Approximate

show all

Defined in:: lib/ruby_llm/tokenizer/backend/approximate.rb

Overview

Approximate tokenizer for models with no published tokenizer (notably Anthropic Claude). Wraps a tiktoken encoding as a stand-in. Token counts are typically within ~5-15% of the model’s true count and should not be used for hard limits.

Instance Attribute Summary

Attributes inherited from Tiktoken

#encoding_name

Instance Method Summary collapse

#encode(text) ⇒ Object
#identifier ⇒ Object
#initialize(encoding: "o200k_base") ⇒ Approximate constructor

A new instance of Approximate.

Methods inherited from Tiktoken

#decode

Methods inherited from Base

#analyze, #count, #decode, #truncate

Constructor Details

#initialize(encoding: "o200k_base") ⇒ `Approximate`

Returns a new instance of Approximate.

# File 'lib/ruby_llm/tokenizer/backend/approximate.rb', line 13

def initialize(encoding: "o200k_base")
  super
  @warned = false
  @warn_mutex = Mutex.new
end

Instance Method Details

#encode(text) ⇒ `Object`

# File 'lib/ruby_llm/tokenizer/backend/approximate.rb', line 19

def encode(text)
  warn_once
  super
end

#identifier ⇒ `Object`



24
25
26

# File 'lib/ruby_llm/tokenizer/backend/approximate.rb', line 24

def identifier
  "approximate:#{encoding_name}"
end