Class: ClaudeMemory::Index::VectorIndex

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/index/vector_index.rb

Overview

Native sqlite-vec KNN search wrapper Follows the same lazy-init pattern as LexicalFTS: the extension and virtual table are created on first use.

Constant Summary collapse

DEFAULT_DIMENSIONS =
384

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(store) ⇒ VectorIndex

Returns a new instance of VectorIndex.



13
14
15
16
17
18
19
# File 'lib/claude_memory/index/vector_index.rb', line 13

def initialize(store)
  @store = store
  @db = store.db
  @available = nil
  @vec_table_ensured = false
  @dimensions = store.get_meta("embedding_dimensions")&.to_i || DEFAULT_DIMENSIONS
end

Instance Attribute Details

#dimensionsObject (readonly)

Returns the value of attribute dimensions.



11
12
13
# File 'lib/claude_memory/index/vector_index.rb', line 11

def dimensions
  @dimensions
end

Instance Method Details

#available?Boolean

Is the sqlite-vec extension loadable? Caches the result after the first probe.

Returns:

  • (Boolean)


23
24
25
26
27
# File 'lib/claude_memory/index/vector_index.rb', line 23

def available?
  return @available unless @available.nil?

  @available = load_extension!
end

#backfill_batch!(limit: 100) ⇒ Integer

Backfill facts that have embedding_json but haven’t been indexed in vec0

Parameters:

  • limit (Integer) (defaults to: 100)

    max facts to process per call

Returns:

  • (Integer)

    number of facts backfilled



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/claude_memory/index/vector_index.rb', line 90

def backfill_batch!(limit: 100)
  return 0 unless available?

  ensure_vec_table!
  rows = @store.facts
    .where(vec_indexed_at: nil)
    .where(Sequel.~(embedding_json: nil))
    .where(status: "active")
    .select(:id, :embedding_json)
    .order(:id)
    .limit(limit)
    .all

  return 0 if rows.empty?

  now = Time.now.utc.iso8601
  indexed_ids = []

  rows.each do |row|
    vector = JSON.parse(row[:embedding_json])
    blob = vector.pack("f*")
    # No DELETE needed: vec_indexed_at is nil so these rows can't be in vec0
    execute_with_params(
      "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)",
      row[:id], blob
    )
    indexed_ids << row[:id]
  rescue JSON::ParserError
    next
  end

  # Batch-update timestamps
  @store.facts.where(id: indexed_ids).update(vec_indexed_at: now) if indexed_ids.any?

  indexed_ids.size
end

#clear!Object

Delete all entries from the vec0 virtual table. Used when clearing stale embeddings after a dimension change.



129
130
131
132
133
134
135
# File 'lib/claude_memory/index/vector_index.rb', line 129

def clear!
  return false unless available?

  ensure_vec_table!
  @db.run("DELETE FROM facts_vec")
  true
end

#countObject

Number of entries in the vec0 virtual table



165
166
167
168
169
170
# File 'lib/claude_memory/index/vector_index.rb', line 165

def count
  return 0 unless available?

  ensure_vec_table!
  @db[:facts_vec].count
end

#coverage_statsHash

Coverage statistics for vec indexing

Returns:

  • (Hash)

    vec_indexed:, coverage_pct:



174
175
176
177
178
179
180
# File 'lib/claude_memory/index/vector_index.rb', line 174

def coverage_stats
  with_embedding = @store.facts.where(Sequel.~(embedding_json: nil)).where(status: "active").count
  vec_indexed = @store.facts.where(Sequel.~(vec_indexed_at: nil)).where(status: "active").count
  coverage_pct = (with_embedding > 0) ? (vec_indexed * 100.0 / with_embedding).round(1) : 0

  {with_embedding: with_embedding, vec_indexed: vec_indexed, coverage_pct: coverage_pct}
end

#insert_embedding(fact_id, vector) ⇒ Object

Insert (or replace) a fact’s embedding into the vec0 virtual table. Also sets vec_indexed_at on the fact row.

Parameters:

  • fact_id (Integer)
  • vector (Array<Float>)

    384-dimensional embedding



33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/claude_memory/index/vector_index.rb', line 33

def insert_embedding(fact_id, vector)
  return false unless available?

  ensure_vec_table!
  blob = vector.pack("f*")
  # vec0 doesn't support INSERT OR REPLACE; delete first
  execute_with_params("DELETE FROM facts_vec WHERE fact_id = ?", fact_id)
  execute_with_params(
    "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)",
    fact_id, blob
  )
  @store.facts.where(id: fact_id).update(vec_indexed_at: Time.now.utc.iso8601)
  true
end

#recreate!(dimensions) ⇒ Object

Drop and rebuild facts_vec at ‘dimensions`. A vec0 column width is immutable once the table is created, so adopting a model of a different dimension (or any model on a DB whose table was created at the 384 default) requires a full rebuild — clearing rows isn’t enough (issue #7, Finding 1). Requires the sqlite-vec extension loaded so the vec0 destructor runs on DROP.

Parameters:

  • dimensions (Integer)

    new embedding width



144
145
146
147
148
149
150
151
152
# File 'lib/claude_memory/index/vector_index.rb', line 144

def recreate!(dimensions)
  return false unless available?

  @dimensions = dimensions
  @db.run("DROP TABLE IF EXISTS facts_vec")
  @vec_table_ensured = false
  ensure_vec_table!
  true
end

#remove_embedding(fact_id) ⇒ Object

Remove a fact’s embedding from the vec0 virtual table. Also clears vec_indexed_at on the fact row.

Parameters:

  • fact_id (Integer)


51
52
53
54
55
56
57
58
# File 'lib/claude_memory/index/vector_index.rb', line 51

def remove_embedding(fact_id)
  return false unless available?

  ensure_vec_table!
  @db[:facts_vec].where(fact_id: fact_id).delete
  @store.facts.where(id: fact_id).update(vec_indexed_at: nil)
  true
end

#search(query_vector, k: 10) ⇒ Array<Hash>

KNN search: returns fact_ids + distances, caller hydrates facts Two-step query pattern (no JOINs with vec0).

Parameters:

  • query_vector (Array<Float>)
  • k (Integer) (defaults to: 10)

    number of nearest neighbors

Returns:

  • (Array<Hash>)
    distance:, similarity:, …


65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/claude_memory/index/vector_index.rb', line 65

def search(query_vector, k: 10)
  return [] unless available?

  ensure_vec_table!
  blob = query_vector.pack("f*")

  rows = @db.synchronize do |conn|
    conn.query(
      "SELECT fact_id, distance FROM facts_vec WHERE embedding MATCH ? AND k = ? ORDER BY distance",
      [blob, k]
    )
  end

  rows.map do |row|
    {
      fact_id: row[:fact_id],
      distance: row[:distance],
      similarity: (1.0 - row[:distance]).clamp(0.0, 1.0)
    }
  end
end

#table_dimensionsInteger?

The width facts_vec was actually created with, parsed from its DDL — or nil when the table doesn’t exist yet. Detects a stale-width table even when the embedding_dimensions meta was never written (old tfidf DBs), which is exactly the case that silently left a 384 table in place.

Returns:

  • (Integer, nil)


159
160
161
162
# File 'lib/claude_memory/index/vector_index.rb', line 159

def table_dimensions
  ddl = @db[:sqlite_master].where(type: "table", name: "facts_vec").get(:sql)
  ddl && ddl[/embedding\s+float\[(\d+)\]/, 1]&.to_i
end