Class: ClaudeMemory::Index::VectorIndex
- Inherits:
-
Object
- Object
- ClaudeMemory::Index::VectorIndex
- Defined in:
- lib/claude_memory/index/vector_index.rb
Overview
Native sqlite-vec KNN search wrapper Follows the same lazy-init pattern as LexicalFTS: the extension and virtual table are created on first use.
Constant Summary collapse
- DEFAULT_DIMENSIONS =
384
Instance Attribute Summary collapse
-
#dimensions ⇒ Object
readonly
Returns the value of attribute dimensions.
Instance Method Summary collapse
-
#available? ⇒ Boolean
Is the sqlite-vec extension loadable? Caches the result after the first probe.
-
#backfill_batch!(limit: 100) ⇒ Integer
Backfill facts that have embedding_json but haven’t been indexed in vec0.
-
#clear! ⇒ Object
Delete all entries from the vec0 virtual table.
-
#count ⇒ Object
Number of entries in the vec0 virtual table.
-
#coverage_stats ⇒ Hash
Coverage statistics for vec indexing.
-
#initialize(store) ⇒ VectorIndex
constructor
A new instance of VectorIndex.
-
#insert_embedding(fact_id, vector) ⇒ Object
Insert (or replace) a fact’s embedding into the vec0 virtual table.
-
#remove_embedding(fact_id) ⇒ Object
Remove a fact’s embedding from the vec0 virtual table.
-
#search(query_vector, k: 10) ⇒ Array<Hash>
KNN search: returns fact_ids + distances, caller hydrates facts Two-step query pattern (no JOINs with vec0).
Constructor Details
#initialize(store) ⇒ VectorIndex
Returns a new instance of VectorIndex.
13 14 15 16 17 18 19 |
# File 'lib/claude_memory/index/vector_index.rb', line 13 def initialize(store) @store = store @db = store.db @available = nil @vec_table_ensured = false @dimensions = store.("embedding_dimensions")&.to_i || DEFAULT_DIMENSIONS end |
Instance Attribute Details
#dimensions ⇒ Object (readonly)
Returns the value of attribute dimensions.
11 12 13 |
# File 'lib/claude_memory/index/vector_index.rb', line 11 def dimensions @dimensions end |
Instance Method Details
#available? ⇒ Boolean
Is the sqlite-vec extension loadable? Caches the result after the first probe.
23 24 25 26 27 |
# File 'lib/claude_memory/index/vector_index.rb', line 23 def available? return @available unless @available.nil? @available = load_extension! end |
#backfill_batch!(limit: 100) ⇒ Integer
Backfill facts that have embedding_json but haven’t been indexed in vec0
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/claude_memory/index/vector_index.rb', line 90 def backfill_batch!(limit: 100) return 0 unless available? ensure_vec_table! rows = @store.facts .where(vec_indexed_at: nil) .where(Sequel.~(embedding_json: nil)) .where(status: "active") .select(:id, :embedding_json) .order(:id) .limit(limit) .all return 0 if rows.empty? now = Time.now.utc.iso8601 indexed_ids = [] rows.each do |row| vector = JSON.parse(row[:embedding_json]) blob = vector.pack("f*") # No DELETE needed: vec_indexed_at is nil so these rows can't be in vec0 execute_with_params( "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)", row[:id], blob ) indexed_ids << row[:id] rescue JSON::ParserError next end # Batch-update timestamps @store.facts.where(id: indexed_ids).update(vec_indexed_at: now) if indexed_ids.any? indexed_ids.size end |
#clear! ⇒ Object
Delete all entries from the vec0 virtual table. Used when clearing stale embeddings after a dimension change.
129 130 131 132 133 134 135 |
# File 'lib/claude_memory/index/vector_index.rb', line 129 def clear! return false unless available? ensure_vec_table! @db.run("DELETE FROM facts_vec") true end |
#count ⇒ Object
Number of entries in the vec0 virtual table
138 139 140 141 142 143 |
# File 'lib/claude_memory/index/vector_index.rb', line 138 def count return 0 unless available? ensure_vec_table! @db[:facts_vec].count end |
#coverage_stats ⇒ Hash
Coverage statistics for vec indexing
147 148 149 150 151 152 153 |
# File 'lib/claude_memory/index/vector_index.rb', line 147 def coverage_stats = @store.facts.where(Sequel.~(embedding_json: nil)).where(status: "active").count vec_indexed = @store.facts.where(Sequel.~(vec_indexed_at: nil)).where(status: "active").count coverage_pct = ( > 0) ? (vec_indexed * 100.0 / ).round(1) : 0 {with_embedding: , vec_indexed: vec_indexed, coverage_pct: coverage_pct} end |
#insert_embedding(fact_id, vector) ⇒ Object
Insert (or replace) a fact’s embedding into the vec0 virtual table. Also sets vec_indexed_at on the fact row.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/claude_memory/index/vector_index.rb', line 33 def (fact_id, vector) return false unless available? ensure_vec_table! blob = vector.pack("f*") # vec0 doesn't support INSERT OR REPLACE; delete first execute_with_params("DELETE FROM facts_vec WHERE fact_id = ?", fact_id) execute_with_params( "INSERT INTO facts_vec(fact_id, embedding) VALUES (?, ?)", fact_id, blob ) @store.facts.where(id: fact_id).update(vec_indexed_at: Time.now.utc.iso8601) true end |
#remove_embedding(fact_id) ⇒ Object
Remove a fact’s embedding from the vec0 virtual table. Also clears vec_indexed_at on the fact row.
51 52 53 54 55 56 57 58 |
# File 'lib/claude_memory/index/vector_index.rb', line 51 def (fact_id) return false unless available? ensure_vec_table! @db[:facts_vec].where(fact_id: fact_id).delete @store.facts.where(id: fact_id).update(vec_indexed_at: nil) true end |
#search(query_vector, k: 10) ⇒ Array<Hash>
KNN search: returns fact_ids + distances, caller hydrates facts Two-step query pattern (no JOINs with vec0).
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/claude_memory/index/vector_index.rb', line 65 def search(query_vector, k: 10) return [] unless available? ensure_vec_table! blob = query_vector.pack("f*") rows = @db.synchronize do |conn| conn.query( "SELECT fact_id, distance FROM facts_vec WHERE embedding MATCH ? AND k = ? ORDER BY distance", [blob, k] ) end rows.map do |row| { fact_id: row[:fact_id], distance: row[:distance], similarity: (1.0 - row[:distance]).clamp(0.0, 1.0) } end end |