Class: Pikuri::VectorDb::Backend::InMemory
- Inherits:
-
Object
- Object
- Pikuri::VectorDb::Backend::InMemory
- Defined in:
- lib/pikuri/vector_db/backend/in_memory.rb
Overview
Pure-Ruby vector store. The educational default backend; IDEAS.md §“Vector DB / RAG” frames it as the “small enough to audit” first stop the demo + guide walk through before promoting users to Chroma for persistence.
What it does
Holds an in-memory Hash from chunk id to [Chunk, vector]; #query computes cosine similarity against every stored vector, sorts descending, returns the top-k as Backend::Result instances. O(n) per query, where n is the number of stored chunks. Fine for thousands of chunks (a personal notes folder, a single product’s docs); slow for millions (a full corporate knowledge base — that’s the Chroma use case).
What it deliberately doesn’t do
-
**No persistence.** RAM-only, intentional — the user who wants persistence picks
Chroma. Reloads from sources on every boot, which makes the in-memory backend the natural teaching shape: the same code path the demo binary walks on startup is the one the user inspects when they’re learning what “indexing” actually means. -
**No approximate search.** Exhaustive scan. Approximate nearest neighbor (HNSW, IVF) adds complexity that doesn’t teach anything additional once the cosine math is clear.
-
**No thread safety.** Indexer runs single-threaded during a boot or reindex; Search calls
#queryfrom the agent’s main thread. No concurrent access today.
Cosine, not dot product
Some embedders return pre-normalized vectors (text-embedding-3, most sentence-transformers); others don’t. Cosine normalizes at compute time, so the backend works regardless of whether the embedder did. The readable two-pass form below (compute dot + magnitudes separately) is intentional over the single-loop micro-optimization — this is the file the newcomer reads to understand what’s happening.
Instance Method Summary collapse
-
#count ⇒ Integer
Current chunk count.
-
#delete_all ⇒ void
Drop every stored chunk.
- #initialize ⇒ InMemory constructor
-
#query(vector:, top_k:) ⇒ Array<Backend::Result>
Cosine-similarity nearest neighbor search.
-
#upsert(chunks:, vectors:) ⇒ void
Insert-or-replace by
chunk.id.
Constructor Details
#initialize ⇒ InMemory
48 49 50 51 52 53 54 55 56 |
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 48 def initialize # id (String) → [Chunk, vector (Array<Float>)] @entries = {} # Dimension of every stored vector. +nil+ before the first # +#upsert+; locked to the dim of the first vector seen and # enforced for every subsequent +#upsert+ + +#query+ — see # the Backend protocol's "Vector-dim contract" yardoc. @dim = nil end |
Instance Method Details
#count ⇒ Integer
Returns current chunk count.
123 124 125 |
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 123 def count @entries.size end |
#delete_all ⇒ void
This method returns an undefined value.
Drop every stored chunk. Used by the v1 nuke-and-reload reindex flow; the embedder dim lock is also released so a reindex with a different embedder model starts clean.
116 117 118 119 120 |
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 116 def delete_all @entries.clear @dim = nil nil end |
#query(vector:, top_k:) ⇒ Array<Backend::Result>
Cosine-similarity nearest neighbor search. Returns the top-k Results in descending score order; empty array when the store has no entries.
97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 97 def query(vector:, top_k:) raise ArgumentError, "top_k must be positive (got #{top_k})" if top_k <= 0 return [] if @entries.empty? if vector.size != @dim raise ArgumentError, "query vector dim #{vector.size}, stored dim #{@dim}" end scored = @entries.values.map do |chunk, stored| Result.new(chunk: chunk, score: cosine(vector, stored)) end scored.sort_by { |r| -r.score }.first(top_k) end |
#upsert(chunks:, vectors:) ⇒ void
This method returns an undefined value.
Insert-or-replace by chunk.id. Parallel arrays of equal length; raises on empty input or length mismatch. Vector dimension is locked at first upsert; raises on any subsequent vector of a different dim.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 68 def upsert(chunks:, vectors:) raise ArgumentError, 'upsert called with empty chunks/vectors' if chunks.empty? if chunks.size != vectors.size raise ArgumentError, "size mismatch: #{chunks.size} chunks vs #{vectors.size} vectors" end expected = @dim || vectors.first.size vectors.each_with_index do |v, i| next if v.size == expected raise ArgumentError, "vector #{i} has dim #{v.size}, expected #{expected}" end @dim ||= expected chunks.zip(vectors).each { |chunk, vector| @entries[chunk.id] = [chunk, vector] } nil end |