Class: Woods::Storage::VectorStore::InMemory

Inherits:
Object
  • Object
show all
Includes:
Interface
Defined in:
lib/woods/storage/vector_store.rb

Overview

In-memory vector store using hash storage and cosine similarity.

Suitable for development and testing. Not intended for production use with large datasets.

Examples:

store = InMemory.new
store.store("doc1", [1.0, 0.0], { type: "model" })
store.store("doc2", [0.0, 1.0], { type: "service" })
store.search([1.0, 0.0], limit: 1)
# => [#<SearchResult id="doc1", score=1.0, metadata={type: "model"}>]

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Interface

#store_batch

Constructor Details

#initializeInMemory

Flat-buffer backing. One Array<Float> of length count*dim holds every vector contiguously; two parallel Arrays hold the ids and metadata at matching positions. Deleted entries are tombstoned (their index is added to @tombstones) rather than removed, so stored vector positions stay stable under concurrent iteration and dumps. Tombstones are compacted at next full-embed run.

The flat buffer exists both for cache friendliness during the cosine kernel (all vectors live in one contiguous allocation) and to make dump/load via ‘pack(“e*”)` a single call rather than a per-vector concatenation.



134
135
136
137
138
139
140
141
# File 'lib/woods/storage/vector_store.rb', line 134

def initialize
  @dim = nil
  @ids = [] # Array<String> (frozen)
  @vectors_flat = [] # flat Array<Float>, length @ids.size * @dim
  @metadata = []     # Array<Hash>, index-aligned with @ids
  @id_to_index = {}  # id => Integer for O(1) delete/overwrite
  @tombstones = Set.new
end

Instance Attribute Details

#dimInteger? (readonly)

Returns dimension of stored vectors, nil if empty.

Returns:

  • (Integer, nil)

    dimension of stored vectors, nil if empty



144
145
146
# File 'lib/woods/storage/vector_store.rb', line 144

def dim
  @dim
end

Instance Method Details

#bulk_load(entries) ⇒ Object

Single-pass hydrate — more efficient than N store calls when the Snapshotter feeds a large dump at boot time.



166
167
168
# File 'lib/woods/storage/vector_store.rb', line 166

def bulk_load(entries)
  entries.each { |entry| store(entry[:id], entry[:vector], entry[:metadata] || {}) }
end

#clear!Object

Drop every stored entry, restoring the store to its post-new state.

Used by the MCP reload tool to pick up a fresh embed run without restarting the process. A subsequent #bulk_load then repopulates from disk. Safe on an already-empty store.



175
176
177
178
179
180
181
182
# File 'lib/woods/storage/vector_store.rb', line 175

def clear!
  @dim = nil
  @ids = []
  @vectors_flat = []
  @metadata = []
  @id_to_index = {}
  @tombstones = Set.new
end

#countObject



228
229
230
# File 'lib/woods/storage/vector_store.rb', line 228

def count
  @ids.size - @tombstones.size
end

#delete(id) ⇒ Object



211
212
213
214
# File 'lib/woods/storage/vector_store.rb', line 211

def delete(id)
  idx = @id_to_index.delete(id)
  @tombstones << idx if idx
end

#delete_by_filter(filters) ⇒ Object



217
218
219
220
221
222
223
224
225
# File 'lib/woods/storage/vector_store.rb', line 217

def delete_by_filter(filters)
  @ids.each_with_index do |id, idx|
    next if @tombstones.include?(idx)
    next unless filters.all? { |key, value| @metadata[idx][key] == value }

    @tombstones << idx
    @id_to_index.delete(id)
  end
end

#each_entry(&block) ⇒ Object



185
186
187
188
189
190
191
192
193
194
# File 'lib/woods/storage/vector_store.rb', line 185

def each_entry(&block)
  return enum_for(:each_entry) unless block

  @ids.each_with_index do |id, idx|
    next if @tombstones.include?(idx)

    base = idx * @dim
    yield(id, @vectors_flat[base, @dim], @metadata[idx])
  end
end

#search(query_vector, limit: 10, filters: {}) ⇒ Object



197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/woods/storage/vector_store.rb', line 197

def search(query_vector, limit: 10, filters: {})
  return [] if @dim.nil?

  unless query_vector.length == @dim
    raise ArgumentError,
          "Vector dimension mismatch (#{query_vector.length} vs #{@dim})"
  end

  scored = gather_candidates(query_vector, filters)
  scored.sort_by! { |r| -r.score }
  scored.first(limit)
end

#store(id, vector, metadata = {}) ⇒ Object



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# File 'lib/woods/storage/vector_store.rb', line 147

def store(id, vector,  = {})
  @dim ||= vector.length
  unless vector.length == @dim
    raise ArgumentError,
          "Vector dimension mismatch (#{vector.length} vs #{@dim})"
  end

  frozen_id = id.frozen? ? id : id.dup.freeze
  existing = @id_to_index[frozen_id]
  if existing
    overwrite(existing, vector, )
  else
    append(frozen_id, vector, )
  end
end