Class: ClaudeMemory::Index::VectorIndex
- Inherits:
-
Object
- Object
- ClaudeMemory::Index::VectorIndex
- Defined in:
- lib/claude_memory/index/vector_index.rb
Overview
Native sqlite-vec KNN search wrapper Follows the same lazy-init pattern as LexicalFTS: the extension and virtual table are created on first use.
Constant Summary collapse
- DEFAULT_DIMENSIONS =
384
Instance Attribute Summary collapse
-
#dimensions ⇒ Object
readonly
Returns the value of attribute dimensions.
Instance Method Summary collapse
-
#available? ⇒ Boolean
Is the sqlite-vec extension loadable? Caches the result after the first probe.
-
#backfill_batch!(limit: 100) ⇒ Integer
Backfill facts that have embedding_json but haven’t been indexed in vec0.
-
#clear! ⇒ Object
Delete all entries from the vec0 virtual table.
-
#count ⇒ Object
Number of entries in the vec0 virtual table.
-
#coverage_stats ⇒ Hash
Coverage statistics for vec indexing.
-
#initialize(store) ⇒ VectorIndex
constructor
A new instance of VectorIndex.
-
#insert_embedding(fact_id, vector) ⇒ Object
Insert (or replace) a fact’s embedding into the vec0 virtual table.
-
#recreate!(dimensions) ⇒ Object
Drop and rebuild facts_vec at ‘dimensions`.
-
#remove_embedding(fact_id) ⇒ Object
Remove a fact’s embedding from the vec0 virtual table.
-
#search(query_vector, k: 10) ⇒ Array<Hash>
KNN search: returns fact_ids + distances, caller hydrates facts Two-step query pattern (no JOINs with vec0).
-
#table_dimensions ⇒ Integer?
The width facts_vec was actually created with, parsed from its DDL — or nil when the table doesn’t exist yet.
Constructor Details
#initialize(store) ⇒ VectorIndex
Returns a new instance of VectorIndex.
13 14 15 16 17 18 19 |
# File 'lib/claude_memory/index/vector_index.rb', line 13 def initialize(store) @store = store @db = store.db @available = nil @vec_table_ensured = false @dimensions = store.("embedding_dimensions")&.to_i || DEFAULT_DIMENSIONS end |
Instance Attribute Details
#dimensions ⇒ Object (readonly)
Returns the value of attribute dimensions.
11 12 13 |
# File 'lib/claude_memory/index/vector_index.rb', line 11 def dimensions @dimensions end |
Instance Method Details
#available? ⇒ Boolean
Is the sqlite-vec extension loadable? Caches the result after the first probe.
23 24 25 26 27 |
# File 'lib/claude_memory/index/vector_index.rb', line 23 def available? return @available unless @available.nil? @available = load_extension! end |
#backfill_batch!(limit: 100) ⇒ Integer
Backfill facts that have embedding_json but haven’t been indexed in vec0
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/claude_memory/index/vector_index.rb', line 90 def backfill_batch!(limit: 100) return 0 unless available? ensure_vec_table! rows = @store.facts .where(vec_indexed_at: nil) .where(Sequel.~(embedding_json: nil)) .where(status: "active") .select(:id, :embedding_json) .order(:id) .limit(limit) .all return 0 if rows.empty? now = Time.now.utc.iso8601 indexed_ids = [] rows.each do |row| vector = JSON.parse(row[:embedding_json]) blob = vector.pack("f*") # No DELETE needed: vec_indexed_at is nil so these rows can't be in vec0 execute_with_params( "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)", row[:id], blob ) indexed_ids << row[:id] rescue JSON::ParserError next end # Batch-update timestamps @store.facts.where(id: indexed_ids).update(vec_indexed_at: now) if indexed_ids.any? indexed_ids.size end |
#clear! ⇒ Object
Delete all entries from the vec0 virtual table. Used when clearing stale embeddings after a dimension change.
129 130 131 132 133 134 135 |
# File 'lib/claude_memory/index/vector_index.rb', line 129 def clear! return false unless available? ensure_vec_table! @db.run("DELETE FROM facts_vec") true end |
#count ⇒ Object
Number of entries in the vec0 virtual table
165 166 167 168 169 170 |
# File 'lib/claude_memory/index/vector_index.rb', line 165 def count return 0 unless available? ensure_vec_table! @db[:facts_vec].count end |
#coverage_stats ⇒ Hash
Coverage statistics for vec indexing
174 175 176 177 178 179 180 |
# File 'lib/claude_memory/index/vector_index.rb', line 174 def coverage_stats = @store.facts.where(Sequel.~(embedding_json: nil)).where(status: "active").count vec_indexed = @store.facts.where(Sequel.~(vec_indexed_at: nil)).where(status: "active").count coverage_pct = ( > 0) ? (vec_indexed * 100.0 / ).round(1) : 0 {with_embedding: , vec_indexed: vec_indexed, coverage_pct: coverage_pct} end |
#insert_embedding(fact_id, vector) ⇒ Object
Insert (or replace) a fact’s embedding into the vec0 virtual table. Also sets vec_indexed_at on the fact row.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/claude_memory/index/vector_index.rb', line 33 def (fact_id, vector) return false unless available? ensure_vec_table! blob = vector.pack("f*") # vec0 doesn't support INSERT OR REPLACE; delete first execute_with_params("DELETE FROM facts_vec WHERE fact_id = ?", fact_id) execute_with_params( "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)", fact_id, blob ) @store.facts.where(id: fact_id).update(vec_indexed_at: Time.now.utc.iso8601) true end |
#recreate!(dimensions) ⇒ Object
Drop and rebuild facts_vec at ‘dimensions`. A vec0 column width is immutable once the table is created, so adopting a model of a different dimension (or any model on a DB whose table was created at the 384 default) requires a full rebuild — clearing rows isn’t enough (issue #7, Finding 1). Requires the sqlite-vec extension loaded so the vec0 destructor runs on DROP.
144 145 146 147 148 149 150 151 152 |
# File 'lib/claude_memory/index/vector_index.rb', line 144 def recreate!(dimensions) return false unless available? @dimensions = dimensions @db.run("DROP TABLE IF EXISTS facts_vec") @vec_table_ensured = false ensure_vec_table! true end |
#remove_embedding(fact_id) ⇒ Object
Remove a fact’s embedding from the vec0 virtual table. Also clears vec_indexed_at on the fact row.
51 52 53 54 55 56 57 58 |
# File 'lib/claude_memory/index/vector_index.rb', line 51 def (fact_id) return false unless available? ensure_vec_table! @db[:facts_vec].where(fact_id: fact_id).delete @store.facts.where(id: fact_id).update(vec_indexed_at: nil) true end |
#search(query_vector, k: 10) ⇒ Array<Hash>
KNN search: returns fact_ids + distances, caller hydrates facts Two-step query pattern (no JOINs with vec0).
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/claude_memory/index/vector_index.rb', line 65 def search(query_vector, k: 10) return [] unless available? ensure_vec_table! blob = query_vector.pack("f*") rows = @db.synchronize do |conn| conn.query( "SELECT fact_id, distance FROM facts_vec WHERE embedding MATCH ? AND k = ? ORDER BY distance", [blob, k] ) end rows.map do |row| { fact_id: row[:fact_id], distance: row[:distance], similarity: (1.0 - row[:distance]).clamp(0.0, 1.0) } end end |
#table_dimensions ⇒ Integer?
The width facts_vec was actually created with, parsed from its DDL — or nil when the table doesn’t exist yet. Detects a stale-width table even when the embedding_dimensions meta was never written (old tfidf DBs), which is exactly the case that silently left a 384 table in place.
159 160 161 162 |
# File 'lib/claude_memory/index/vector_index.rb', line 159 def table_dimensions ddl = @db[:sqlite_master].where(type: "table", name: "facts_vec").get(:sql) ddl && ddl[/embedding\s+float\[(\d+)\]/, 1]&.to_i end |