Class: ClaudeMemory::Index::VectorIndex

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/index/vector_index.rb

Overview

Native sqlite-vec KNN search wrapper Follows the same lazy-init pattern as LexicalFTS: the extension and virtual table are created on first use.

Constant Summary collapse

DEFAULT_DIMENSIONS =
384

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(store) ⇒ VectorIndex

Returns a new instance of VectorIndex.



13
14
15
16
17
18
19
# File 'lib/claude_memory/index/vector_index.rb', line 13

def initialize(store)
  @store = store
  @db = store.db
  @available = nil
  @vec_table_ensured = false
  @dimensions = store.get_meta("embedding_dimensions")&.to_i || DEFAULT_DIMENSIONS
end

Instance Attribute Details

#dimensionsObject (readonly)

Returns the value of attribute dimensions.



11
12
13
# File 'lib/claude_memory/index/vector_index.rb', line 11

def dimensions
  @dimensions
end

Instance Method Details

#available?Boolean

Is the sqlite-vec extension loadable? Caches the result after the first probe.

Returns:

  • (Boolean)


23
24
25
26
27
# File 'lib/claude_memory/index/vector_index.rb', line 23

def available?
  return @available unless @available.nil?

  @available = load_extension!
end

#backfill_batch!(limit: 100) ⇒ Integer

Backfill facts that have embedding_json but haven’t been indexed in vec0

Parameters:

  • limit (Integer) (defaults to: 100)

    max facts to process per call

Returns:

  • (Integer)

    number of facts backfilled



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/claude_memory/index/vector_index.rb', line 90

def backfill_batch!(limit: 100)
  return 0 unless available?

  ensure_vec_table!
  rows = @store.facts
    .where(vec_indexed_at: nil)
    .where(Sequel.~(embedding_json: nil))
    .where(status: "active")
    .select(:id, :embedding_json)
    .order(:id)
    .limit(limit)
    .all

  return 0 if rows.empty?

  now = Time.now.utc.iso8601
  indexed_ids = []

  rows.each do |row|
    vector = JSON.parse(row[:embedding_json])
    blob = vector.pack("f*")
    # No DELETE needed: vec_indexed_at is nil so these rows can't be in vec0
    execute_with_params(
      "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)",
      row[:id], blob
    )
    indexed_ids << row[:id]
  rescue JSON::ParserError
    next
  end

  # Batch-update timestamps
  @store.facts.where(id: indexed_ids).update(vec_indexed_at: now) if indexed_ids.any?

  indexed_ids.size
end

#clear!Object

Delete all entries from the vec0 virtual table. Used when clearing stale embeddings after a dimension change.



129
130
131
132
133
134
135
# File 'lib/claude_memory/index/vector_index.rb', line 129

def clear!
  return false unless available?

  ensure_vec_table!
  @db.run("DELETE FROM facts_vec")
  true
end

#countObject

Number of entries in the vec0 virtual table



138
139
140
141
142
143
# File 'lib/claude_memory/index/vector_index.rb', line 138

def count
  return 0 unless available?

  ensure_vec_table!
  @db[:facts_vec].count
end

#coverage_statsHash

Coverage statistics for vec indexing

Returns:

  • (Hash)

    vec_indexed:, coverage_pct:



147
148
149
150
151
152
153
# File 'lib/claude_memory/index/vector_index.rb', line 147

def coverage_stats
  with_embedding = @store.facts.where(Sequel.~(embedding_json: nil)).where(status: "active").count
  vec_indexed = @store.facts.where(Sequel.~(vec_indexed_at: nil)).where(status: "active").count
  coverage_pct = (with_embedding > 0) ? (vec_indexed * 100.0 / with_embedding).round(1) : 0

  {with_embedding: with_embedding, vec_indexed: vec_indexed, coverage_pct: coverage_pct}
end

#insert_embedding(fact_id, vector) ⇒ Object

Insert (or replace) a fact’s embedding into the vec0 virtual table. Also sets vec_indexed_at on the fact row.

Parameters:

  • fact_id (Integer)
  • vector (Array<Float>)

    384-dimensional embedding



33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/claude_memory/index/vector_index.rb', line 33

def insert_embedding(fact_id, vector)
  return false unless available?

  ensure_vec_table!
  blob = vector.pack("f*")
  # vec0 doesn't support INSERT OR REPLACE; delete first
  execute_with_params("DELETE FROM facts_vec WHERE fact_id = ?", fact_id)
  execute_with_params(
    "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)",
    fact_id, blob
  )
  @store.facts.where(id: fact_id).update(vec_indexed_at: Time.now.utc.iso8601)
  true
end

#remove_embedding(fact_id) ⇒ Object

Remove a fact’s embedding from the vec0 virtual table. Also clears vec_indexed_at on the fact row.

Parameters:

  • fact_id (Integer)


51
52
53
54
55
56
57
58
# File 'lib/claude_memory/index/vector_index.rb', line 51

def remove_embedding(fact_id)
  return false unless available?

  ensure_vec_table!
  @db[:facts_vec].where(fact_id: fact_id).delete
  @store.facts.where(id: fact_id).update(vec_indexed_at: nil)
  true
end

#search(query_vector, k: 10) ⇒ Array<Hash>

KNN search: returns fact_ids + distances, caller hydrates facts Two-step query pattern (no JOINs with vec0).

Parameters:

  • query_vector (Array<Float>)
  • k (Integer) (defaults to: 10)

    number of nearest neighbors

Returns:

  • (Array<Hash>)
    distance:, similarity:, …


65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/claude_memory/index/vector_index.rb', line 65

def search(query_vector, k: 10)
  return [] unless available?

  ensure_vec_table!
  blob = query_vector.pack("f*")

  rows = @db.synchronize do |conn|
    conn.query(
      "SELECT fact_id, distance FROM facts_vec WHERE embedding MATCH ? AND k = ? ORDER BY distance",
      [blob, k]
    )
  end

  rows.map do |row|
    {
      fact_id: row[:fact_id],
      distance: row[:distance],
      similarity: (1.0 - row[:distance]).clamp(0.0, 1.0)
    }
  end
end