Class: ClaudeMemory::Embeddings::Generator
- Inherits:
-
Object
- Object
- ClaudeMemory::Embeddings::Generator
- Defined in:
- lib/claude_memory/embeddings/generator.rb
Overview
Lightweight embedding generator using TF-IDF approach Generates normalized 384-dimensional vectors for semantic similarity
This is a pragmatic implementation that works without heavy dependencies. Future: Can be upgraded to transformer-based models (sentence-transformers)
Constant Summary collapse
- EMBEDDING_DIM =
384- VOCABULARY =
Common technical terms and programming concepts for vocabulary
%w[ database framework library module class function method api rest graphql http request response server client authentication authorization token session cookie jwt user admin role permission access control security error exception handling validation sanitization test spec unit integration end-to-end e2e frontend backend fullstack ui ux component react vue angular svelte javascript typescript ruby python java go rust php elixir sql nosql postgresql mysql mongodb redis sqlite docker kubernetes container orchestration deployment git branch commit merge pull push repository configuration environment variable setting preference logger logging debug trace info warn error cache caching storage persistence state async await promise callback thread process route routing middleware handler controller model view template render component form input button submit validation dependency injection service factory singleton migration schema table column index constraint query filter sort pagination limit offset create read update delete crud operation json xml yaml csv format serialization encrypt decrypt hash salt cipher algorithm webhook event listener subscriber publisher job queue worker background task schedule metric monitoring performance optimization refactor cleanup technical debt improvement ].freeze
Instance Method Summary collapse
- #dimensions ⇒ Object
-
#generate(text) ⇒ Array<Float>
Generate embedding vector for text.
-
#initialize ⇒ Generator
constructor
A new instance of Generator.
- #name ⇒ Object
Constructor Details
#initialize ⇒ Generator
Returns a new instance of Generator.
52 53 54 55 |
# File 'lib/claude_memory/embeddings/generator.rb', line 52 def initialize @vocabulary_index = VOCABULARY.each_with_index.to_h @idf_weights = compute_idf_weights end |
Instance Method Details
#dimensions ⇒ Object
17 |
# File 'lib/claude_memory/embeddings/generator.rb', line 17 def dimensions = EMBEDDING_DIM |
#generate(text) ⇒ Array<Float>
Generate embedding vector for text
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/claude_memory/embeddings/generator.rb', line 60 def generate(text) return zero_vector if text.nil? || text.empty? # Tokenize and compute TF-IDF tokens = tokenize(text.downcase) return zero_vector if tokens.empty? # Build term frequency map tf_map = tokens.each_with_object(Hash.new(0)) { |token, h| h[token] += 1 } # Normalize term frequencies max_tf = tf_map.values.max.to_f tf_map.transform_values! { |count| count / max_tf } # Compute TF-IDF vector vector = Array.new(VOCABULARY.size, 0.0) tf_map.each do |term, tf| idx = @vocabulary_index[term] next unless idx idf = @idf_weights[term] || 1.0 vector[idx] = tf * idf end # Add positional encoding to capture word order (simple hash-based) positional_features = compute_positional_features(tokens) # Combine vocabulary vector with positional features combined = vector + positional_features # Pad or truncate to EMBEDDING_DIM final_vector = if combined.size > EMBEDDING_DIM combined[0...EMBEDDING_DIM] else combined + Array.new(EMBEDDING_DIM - combined.size, 0.0) end # Normalize to unit length for cosine similarity normalize(final_vector) end |
#name ⇒ Object
15 |
# File 'lib/claude_memory/embeddings/generator.rb', line 15 def name = "tfidf" |