Class: Pikuri::VectorDb::Backend::InMemory

Inherits:
Object
  • Object
show all
Defined in:
lib/pikuri/vector_db/backend/in_memory.rb

Overview

Pure-Ruby vector store. The educational default backend; IDEAS.md §“Vector DB / RAG” frames it as the “small enough to audit” first stop the demo + guide walk through before promoting users to Chroma for persistence.

What it does

Holds an in-memory Hash from chunk id to [Chunk, vector]; #query computes cosine similarity against every stored vector, sorts descending, returns the top-k as Backend::Result instances. O(n) per query, where n is the number of stored chunks. Fine for thousands of chunks (a personal notes folder, a single product’s docs); slow for millions (a full corporate knowledge base — that’s the Chroma use case).

What it deliberately doesn’t do

  • **No persistence.** RAM-only, intentional — the user who wants persistence picks Chroma. Reloads from sources on every boot, which makes the in-memory backend the natural teaching shape: the same code path the demo binary walks on startup is the one the user inspects when they’re learning what “indexing” actually means.

  • **No approximate search.** Exhaustive scan. Approximate nearest neighbor (HNSW, IVF) adds complexity that doesn’t teach anything additional once the cosine math is clear.

  • **No thread safety.** Indexer runs single-threaded during a boot or reindex; Search calls #query from the agent’s main thread. No concurrent access today.

Cosine, not dot product

Some embedders return pre-normalized vectors (text-embedding-3, most sentence-transformers); others don’t. Cosine normalizes at compute time, so the backend works regardless of whether the embedder did. The readable two-pass form below (compute dot + magnitudes separately) is intentional over the single-loop micro-optimization — this is the file the newcomer reads to understand what’s happening.

Instance Method Summary collapse

Constructor Details

#initializeInMemory



48
49
50
51
52
53
54
55
56
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 48

def initialize
  # id (String) → [Chunk, vector (Array<Float>)]
  @entries = {}
  # Dimension of every stored vector. +nil+ before the first
  # +#upsert+; locked to the dim of the first vector seen and
  # enforced for every subsequent +#upsert+ + +#query+ — see
  # the Backend protocol's "Vector-dim contract" yardoc.
  @dim = nil
end

Instance Method Details

#countInteger

Returns current chunk count.

Returns:

  • (Integer)

    current chunk count.



123
124
125
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 123

def count
  @entries.size
end

#delete_allvoid

This method returns an undefined value.

Drop every stored chunk. Used by the v1 nuke-and-reload reindex flow; the embedder dim lock is also released so a reindex with a different embedder model starts clean.



116
117
118
119
120
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 116

def delete_all
  @entries.clear
  @dim = nil
  nil
end

#query(vector:, top_k:) ⇒ Array<Backend::Result>

Cosine-similarity nearest neighbor search. Returns the top-k Results in descending score order; empty array when the store has no entries.

Parameters:

  • vector (Array<Float>)

    query vector; must match the stored vector dim.

  • top_k (Integer)

    number of results to return; must be positive.

Returns:

Raises:

  • (ArgumentError)

    on top_k <= 0 or query-vector dim mismatch.



97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 97

def query(vector:, top_k:)
  raise ArgumentError, "top_k must be positive (got #{top_k})" if top_k <= 0
  return [] if @entries.empty?

  if vector.size != @dim
    raise ArgumentError, "query vector dim #{vector.size}, stored dim #{@dim}"
  end

  scored = @entries.values.map do |chunk, stored|
    Result.new(chunk: chunk, score: cosine(vector, stored))
  end
  scored.sort_by { |r| -r.score }.first(top_k)
end

#upsert(chunks:, vectors:) ⇒ void

This method returns an undefined value.

Insert-or-replace by chunk.id. Parallel arrays of equal length; raises on empty input or length mismatch. Vector dimension is locked at first upsert; raises on any subsequent vector of a different dim.

Parameters:

  • chunks (Array<Chunk>)
  • vectors (Array<Array<Float>>)

Raises:

  • (ArgumentError)

    on empty input, length mismatch, or vector-dim mismatch.



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/pikuri/vector_db/backend/in_memory.rb', line 68

def upsert(chunks:, vectors:)
  raise ArgumentError, 'upsert called with empty chunks/vectors' if chunks.empty?
  if chunks.size != vectors.size
    raise ArgumentError, "size mismatch: #{chunks.size} chunks vs #{vectors.size} vectors"
  end

  expected = @dim || vectors.first.size
  vectors.each_with_index do |v, i|
    next if v.size == expected

    raise ArgumentError, "vector #{i} has dim #{v.size}, expected #{expected}"
  end
  @dim ||= expected

  chunks.zip(vectors).each { |chunk, vector| @entries[chunk.id] = [chunk, vector] }
  nil
end