Class: ClaudeMemory::Embeddings::Generator

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/embeddings/generator.rb

Overview

Lightweight embedding generator using TF-IDF approach Generates normalized 384-dimensional vectors for semantic similarity

This is a pragmatic implementation that works without heavy dependencies. Future: Can be upgraded to transformer-based models (sentence-transformers)

Constant Summary collapse

EMBEDDING_DIM =
384
VOCABULARY =

Common technical terms and programming concepts for vocabulary

%w[
  database framework library module class function method
  api rest graphql http request response server client
  authentication authorization token session cookie jwt
  user admin role permission access control security
  error exception handling validation sanitization
  test spec unit integration end-to-end e2e
  frontend backend fullstack ui ux component
  react vue angular svelte javascript typescript
  ruby python java go rust php elixir
  sql nosql postgresql mysql mongodb redis sqlite
  docker kubernetes container orchestration deployment
  git branch commit merge pull push repository
  configuration environment variable setting preference
  logger logging debug trace info warn error
  cache caching storage persistence state
  async await promise callback thread process
  route routing middleware handler controller
  model view template render component
  form input button submit validation
  dependency injection service factory singleton
  migration schema table column index constraint
  query filter sort pagination limit offset
  create read update delete crud operation
  json xml yaml csv format serialization
  encrypt decrypt hash salt cipher algorithm
  webhook event listener subscriber publisher
  job queue worker background task schedule
  metric monitoring performance optimization
  refactor cleanup technical debt improvement
].freeze

Instance Method Summary collapse

Constructor Details

#initializeGenerator

Returns a new instance of Generator.



52
53
54
55
# File 'lib/claude_memory/embeddings/generator.rb', line 52

def initialize
  @vocabulary_index = VOCABULARY.each_with_index.to_h
  @idf_weights = compute_idf_weights
end

Instance Method Details

#dimensionsObject



17
# File 'lib/claude_memory/embeddings/generator.rb', line 17

def dimensions = EMBEDDING_DIM

#generate(text) ⇒ Array<Float>

Generate embedding vector for text

Parameters:

  • text (String)

    input text to embed

Returns:

  • (Array<Float>)

    normalized 384-dimensional vector



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/claude_memory/embeddings/generator.rb', line 60

def generate(text)
  return zero_vector if text.nil? || text.empty?

  # Tokenize and compute TF-IDF
  tokens = tokenize(text.downcase)
  return zero_vector if tokens.empty?

  # Build term frequency map
  tf_map = tokens.each_with_object(Hash.new(0)) { |token, h| h[token] += 1 }

  # Normalize term frequencies
  max_tf = tf_map.values.max.to_f
  tf_map.transform_values! { |count| count / max_tf }

  # Compute TF-IDF vector
  vector = Array.new(VOCABULARY.size, 0.0)
  tf_map.each do |term, tf|
    idx = @vocabulary_index[term]
    next unless idx

    idf = @idf_weights[term] || 1.0
    vector[idx] = tf * idf
  end

  # Add positional encoding to capture word order (simple hash-based)
  positional_features = compute_positional_features(tokens)

  # Combine vocabulary vector with positional features
  combined = vector + positional_features

  # Pad or truncate to EMBEDDING_DIM
  final_vector = if combined.size > EMBEDDING_DIM
    combined[0...EMBEDDING_DIM]
  else
    combined + Array.new(EMBEDDING_DIM - combined.size, 0.0)
  end

  # Normalize to unit length for cosine similarity
  normalize(final_vector)
end

#nameObject



15
# File 'lib/claude_memory/embeddings/generator.rb', line 15

def name = "tfidf"