Class: ClaudeMemory::Embeddings::ModelRegistry
- Inherits:
-
Object
- Object
- ClaudeMemory::Embeddings::ModelRegistry
- Defined in:
- lib/claude_memory/embeddings/model_registry.rb
Overview
Registry of known embedding models with their properties. Enables model validation, dimension lookup, and discoverability.
Models are registered by canonical name (e.g., “BAAI/bge-small-en-v1.5”) with provider type, dimensions, and description.
Usage:
ModelRegistry.find("BAAI/bge-small-en-v1.5")
# => {provider: "fastembed", dimensions: 384, description: "...", ...}
ModelRegistry.models_for_provider("fastembed")
# => [...]
Defined Under Namespace
Classes: ModelInfo
Constant Summary collapse
- MODELS =
Known models with validated dimensions. Fastembed models sourced from fastembed-rb SUPPORTED_MODELS. API models sourced from provider documentation.
[ # --- fastembed: local ONNX models (no API key needed) --- ModelInfo.new( name: "BAAI/bge-small-en-v1.5", provider: "fastembed", dimensions: 384, description: "Fast English embedding (default)", size_mb: 67, max_tokens: 512 ), ModelInfo.new( name: "BAAI/bge-base-en-v1.5", provider: "fastembed", dimensions: 768, description: "Balanced English embedding, higher accuracy", size_mb: 210, max_tokens: 512 ), ModelInfo.new( name: "BAAI/bge-large-en-v1.5", provider: "fastembed", dimensions: 1024, description: "High accuracy English embedding", size_mb: 1200, max_tokens: 512 ), ModelInfo.new( name: "sentence-transformers/all-MiniLM-L6-v2", provider: "fastembed", dimensions: 384, description: "Lightweight general-purpose sentence embedding", size_mb: 90, max_tokens: 512 ), ModelInfo.new( name: "intfloat/multilingual-e5-small", provider: "fastembed", dimensions: 384, description: "Multilingual embedding, 100+ languages", size_mb: 450, max_tokens: 512 ), ModelInfo.new( name: "intfloat/multilingual-e5-base", provider: "fastembed", dimensions: 768, description: "Larger multilingual embedding", size_mb: 1110, max_tokens: 512 ), ModelInfo.new( name: "nomic-ai/nomic-embed-text-v1.5", provider: "fastembed", dimensions: 768, description: "Long context (8192 tokens) with Matryoshka support", size_mb: 520, max_tokens: 8192 ), ModelInfo.new( name: "jinaai/jina-embeddings-v2-small-en", provider: "fastembed", dimensions: 512, description: "Small English embedding, 8192 token context", size_mb: 60, max_tokens: 8192 ), ModelInfo.new( name: "jinaai/jina-embeddings-v2-base-en", provider: "fastembed", dimensions: 768, description: "Base English embedding, 8192 token context", size_mb: 520, max_tokens: 8192 ), # --- api: OpenAI-compatible endpoints --- ModelInfo.new( name: "text-embedding-3-small", provider: "api", dimensions: 1536, description: "OpenAI small embedding (default API model)", size_mb: nil, max_tokens: 8191 ), ModelInfo.new( name: "text-embedding-3-large", provider: "api", dimensions: 3072, description: "OpenAI large embedding, highest accuracy", size_mb: nil, max_tokens: 8191 ), ModelInfo.new( name: "text-embedding-ada-002", provider: "api", dimensions: 1536, description: "OpenAI legacy embedding", size_mb: nil, max_tokens: 8191 ), ModelInfo.new( name: "voyage-3", provider: "api", dimensions: 1024, description: "Voyage AI general-purpose embedding", size_mb: nil, max_tokens: 32000 ), ModelInfo.new( name: "voyage-3-lite", provider: "api", dimensions: 512, description: "Voyage AI lightweight embedding", size_mb: nil, max_tokens: 32000 ), ModelInfo.new( name: "voyage-code-3", provider: "api", dimensions: 1024, description: "Voyage AI code-optimized embedding", size_mb: nil, max_tokens: 32000 ), # --- tfidf: built-in, no dependencies --- ModelInfo.new( name: "tfidf", provider: "tfidf", dimensions: 384, description: "Built-in TF-IDF embedding (no dependencies)", size_mb: 0, max_tokens: nil ) ].freeze
- MODELS_BY_NAME =
MODELS.each_with_object({}) { |m, h| h[m.name] = m }.freeze
- DEFAULTS =
{ "fastembed" => "BAAI/bge-small-en-v1.5", "api" => "text-embedding-3-small", "tfidf" => "tfidf" }.freeze
Class Method Summary collapse
-
.default_for_provider(provider) ⇒ ModelInfo?
Return the default ModelInfo for a provider.
-
.dimensions_for(name) ⇒ Integer?
Look up dimensions for a model name.
-
.find(name) ⇒ ModelInfo?
Find a model by name.
-
.model_names ⇒ Array<String>
All known model names.
-
.models_for_provider(provider) ⇒ Array<ModelInfo>
List all models for a given provider.
-
.providers ⇒ Array<String>
All provider names.
Class Method Details
.default_for_provider(provider) ⇒ ModelInfo?
Return the default ModelInfo for a provider.
204 205 206 207 |
# File 'lib/claude_memory/embeddings/model_registry.rb', line 204 def self.default_for_provider(provider) default_name = DEFAULTS[provider] find(default_name) if default_name end |
.dimensions_for(name) ⇒ Integer?
Look up dimensions for a model name. Returns nil if unknown.
197 198 199 |
# File 'lib/claude_memory/embeddings/model_registry.rb', line 197 def self.dimensions_for(name) find(name)&.dimensions end |
.find(name) ⇒ ModelInfo?
Find a model by name.
171 172 173 |
# File 'lib/claude_memory/embeddings/model_registry.rb', line 171 def self.find(name) MODELS_BY_NAME[name] end |
.model_names ⇒ Array<String>
All known model names.
184 185 186 |
# File 'lib/claude_memory/embeddings/model_registry.rb', line 184 def self.model_names MODELS.map(&:name) end |
.models_for_provider(provider) ⇒ Array<ModelInfo>
List all models for a given provider.
178 179 180 |
# File 'lib/claude_memory/embeddings/model_registry.rb', line 178 def self.models_for_provider(provider) MODELS.select { |m| m.provider == provider } end |
.providers ⇒ Array<String>
All provider names.
190 191 192 |
# File 'lib/claude_memory/embeddings/model_registry.rb', line 190 def self.providers MODELS.map(&:provider).uniq end |