Class: Woods::Storage::VectorStore::InMemory
- Inherits:
-
Object
- Object
- Woods::Storage::VectorStore::InMemory
- Includes:
- Interface
- Defined in:
- lib/woods/storage/vector_store.rb
Overview
In-memory vector store using hash storage and cosine similarity.
Suitable for development and testing. Not intended for production use with large datasets.
Instance Attribute Summary collapse
-
#dim ⇒ Integer?
readonly
Dimension of stored vectors, nil if empty.
Instance Method Summary collapse
-
#bulk_load(entries) ⇒ Object
Single-pass hydrate — more efficient than N store calls when the Snapshotter feeds a large dump at boot time.
-
#clear! ⇒ Object
Drop every stored entry, restoring the store to its post-
newstate. - #count ⇒ Object
- #delete(id) ⇒ Object
- #delete_by_filter(filters) ⇒ Object
- #each_entry(&block) ⇒ Object
-
#initialize ⇒ InMemory
constructor
Flat-buffer backing.
- #search(query_vector, limit: 10, filters: {}) ⇒ Object
- #store(id, vector, metadata = {}) ⇒ Object
Methods included from Interface
Constructor Details
#initialize ⇒ InMemory
Flat-buffer backing. One Array<Float> of length count*dim holds every vector contiguously; two parallel Arrays hold the ids and metadata at matching positions. Deleted entries are tombstoned (their index is added to @tombstones) rather than removed, so stored vector positions stay stable under concurrent iteration and dumps. Tombstones are compacted at next full-embed run.
The flat buffer exists both for cache friendliness during the cosine kernel (all vectors live in one contiguous allocation) and to make dump/load via ‘pack(“e*”)` a single call rather than a per-vector concatenation.
134 135 136 137 138 139 140 141 |
# File 'lib/woods/storage/vector_store.rb', line 134 def initialize @dim = nil @ids = [] # Array<String> (frozen) @vectors_flat = [] # flat Array<Float>, length @ids.size * @dim @metadata = [] # Array<Hash>, index-aligned with @ids @id_to_index = {} # id => Integer for O(1) delete/overwrite @tombstones = Set.new end |
Instance Attribute Details
#dim ⇒ Integer? (readonly)
Returns dimension of stored vectors, nil if empty.
144 145 146 |
# File 'lib/woods/storage/vector_store.rb', line 144 def dim @dim end |
Instance Method Details
#bulk_load(entries) ⇒ Object
Single-pass hydrate — more efficient than N store calls when the Snapshotter feeds a large dump at boot time.
166 167 168 |
# File 'lib/woods/storage/vector_store.rb', line 166 def bulk_load(entries) entries.each { |entry| store(entry[:id], entry[:vector], entry[:metadata] || {}) } end |
#clear! ⇒ Object
Drop every stored entry, restoring the store to its post-new state.
Used by the MCP reload tool to pick up a fresh embed run without restarting the process. A subsequent #bulk_load then repopulates from disk. Safe on an already-empty store.
175 176 177 178 179 180 181 182 |
# File 'lib/woods/storage/vector_store.rb', line 175 def clear! @dim = nil @ids = [] @vectors_flat = [] @metadata = [] @id_to_index = {} @tombstones = Set.new end |
#count ⇒ Object
228 229 230 |
# File 'lib/woods/storage/vector_store.rb', line 228 def count @ids.size - @tombstones.size end |
#delete(id) ⇒ Object
211 212 213 214 |
# File 'lib/woods/storage/vector_store.rb', line 211 def delete(id) idx = @id_to_index.delete(id) @tombstones << idx if idx end |
#delete_by_filter(filters) ⇒ Object
217 218 219 220 221 222 223 224 225 |
# File 'lib/woods/storage/vector_store.rb', line 217 def delete_by_filter(filters) @ids.each_with_index do |id, idx| next if @tombstones.include?(idx) next unless filters.all? { |key, value| @metadata[idx][key] == value } @tombstones << idx @id_to_index.delete(id) end end |
#each_entry(&block) ⇒ Object
185 186 187 188 189 190 191 192 193 194 |
# File 'lib/woods/storage/vector_store.rb', line 185 def each_entry(&block) return enum_for(:each_entry) unless block @ids.each_with_index do |id, idx| next if @tombstones.include?(idx) base = idx * @dim yield(id, @vectors_flat[base, @dim], @metadata[idx]) end end |
#search(query_vector, limit: 10, filters: {}) ⇒ Object
197 198 199 200 201 202 203 204 205 206 207 208 |
# File 'lib/woods/storage/vector_store.rb', line 197 def search(query_vector, limit: 10, filters: {}) return [] if @dim.nil? unless query_vector.length == @dim raise ArgumentError, "Vector dimension mismatch (#{query_vector.length} vs #{@dim})" end scored = gather_candidates(query_vector, filters) scored.sort_by! { |r| -r.score } scored.first(limit) end |
#store(id, vector, metadata = {}) ⇒ Object
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/woods/storage/vector_store.rb', line 147 def store(id, vector, = {}) @dim ||= vector.length unless vector.length == @dim raise ArgumentError, "Vector dimension mismatch (#{vector.length} vs #{@dim})" end frozen_id = id.frozen? ? id : id.dup.freeze existing = @id_to_index[frozen_id] if existing overwrite(existing, vector, ) else append(frozen_id, vector, ) end end |