pikuri-vectordb

Local-corpus vector search + agentic RAG for the pikuri AI-assistant toolkit: semantic recall over a pile of files you point it at — your notes, your docs, your contracts — where the agent decides when to retrieve, same Thought → Tool-call → Observation loop as every other tool.

Wire it onto a pikuri-core agent the same way as pikuri-tasks / pikuri-memoryc.add_extension inside the Agent.new block:

require 'pikuri-vectordb'

Pikuri::Agent.new(transport: ..., system_prompt: ...) do |c|
  c.add_extension Pikuri::VectorDb::Extension.new(
    backend: Pikuri::VectorDb::Backend::InMemory.new,
    source:  '~/notes'
  )
end

What you get

Three tools, registered by the extension:

  1. vectordb_search — embeds the query, pulls the top-k nearest chunks from the backend, optionally reranks them with a cross-encoder, and hands the agent a numbered list of source (score=…) snippets as its next observation.
  2. vectordb_read — parent-document retrieval: when a search surfaces a clean hit, the agent reads that whole document by its source path instead of re-querying for more fragments of it.
  3. vectordb_reindex — rebuilds the index from the source, on request.

The extension registers the tools and nothing else — populating the index is the host's call, never something done behind your back. Three equally valid shapes:

  • Index at boot: extension.indexer.index_if_empty!.
  • Keep it live: run a Pikuri::VectorDb::Watcher around extension.indexer — a filesystem-event daemon (the listen gem) that sweeps once on boot and reindexes files as they change.
  • Leave it empty and let the user drive: the agent calls vectordb_reindex when asked.

Backends

Three implementations of one duck-typed interface (#upsert / #query / #delete_all / #count) — swapping is a one-line change:

  • Backend::InMemory — the educational default. Pure-Ruby cosine over Array<Float>, ~40 lines, reads in one sitting. RAM-only: everything reloads from sources on every boot.
  • Backend::Qdrant — thin Faraday HTTP client against a self-hosted Qdrant. The recommended persistent backendDESIGN.md has the engine survey behind the pick.
  • Backend::Chroma — the supported ChromaDB alternative, identical wiring.

Each persistent engine pairs with a Server::* supervisor that runs it as a self-managed docker container: pinned image, a container name pikuri owns (pikuri-internal-qdrant / pikuri-internal-chroma), data bind-mounted under ~/.cache/pikuri/ so the corpus survives container recreation.

# Supervised container (needs docker on PATH):
backend = Pikuri::VectorDb::Server::Qdrant.ensure_running.client(
  collection: 'my-docs'
)

# Or point at a Qdrant you already run:
backend = Pikuri::VectorDb::Backend::Qdrant.new(
  host: 'localhost', port: 6333, collection: 'my-docs'
)

Collection naming is engine-specific so it lives on the backend constructor, not on the Extension — Backend::InMemory has no collection concept.

The indexing pipeline

What vectordb_reindex (and the Watcher) actually runs, piece by piece — each swappable via the Extension's keyword arguments:

  • Chunker (Chunker::FixedWindow) — overlapping windows, default 512 tokens with 50 of overlap, so an answer straddling a boundary survives in at least one chunk.
  • Tokenizer (Tokenizer::CharHeuristic default / Tokenizer::LlamaServer) — counts tokens for the chunker; the heuristic is the offline ~4-chars-per-token rule, the LlamaServer variant asks the embedder's /tokenize endpoint for an exact count.
  • Embedder — thin wrapper over RubyLLM.embed; tests inject a fake #embed without monkey-patching ruby_llm.
  • Reranker (Reranker::LlamaServer, optional) — cross-encoder over POST /v1/rerank. Pass reranker: nil to skip it; retrieval falls back to vector-only top-k — less precision, same correctness.

Text extraction reuses Pikuri::FileType.read_as_text from pikuri-core — plain text / Markdown / PDF. HTML extraction is a deferred follow-up.

Demo: pikuri-corpus

From a source checkout (not installed by gem install):

./pikuri-vectordb/bin/pikuri-corpus --qdrant --watch

A single recall agent over docs/guide/ (the pikuri guide itself) with no egress — its tools are the three above plus calculator; no web search, no fetch, no bash. The corpus stands in for private data, and an agent that can read it must not also be able to send it out. --qdrant / --chroma persist the index across runs, --watch keeps it live, --no-reranker drops the reranker requirement. The guide's chapter 3 is the full walkthrough.

The LIBRARIAN persona

For hosts that want recall behind a privilege-separated sub-agent — the right shape once the parent agent has egress (see SECURITY.md at the repo root) — the bundled Pikuri::VectorDb::LIBRARIAN persona is opt-in via pikuri-subagents:

require 'pikuri-subagents'

c.add_extension(
  Pikuri::SubAgent::Extension.new(
    personas: [Pikuri::VectorDb::LIBRARIAN]
  )
)

Three model endpoints

A full setup wants three LLM endpoints: chat (via ruby_llm), an embedder (via RubyLLM.embed), and an optional reranker (HTTP /v1/rerank). Recommended setup: one llama-server running in router mode — started with no --model flag, it serves every GGUF in ~/.cache/llama.cpp/ from a single port and loads whichever model each request asks for. Requires a recent enough llama.cpp build to include the model-management feature; Ubuntu 26.04+ packages one. The guide's chapter 1 walks through the setup; chapter 3 adds the embedder and reranker on top.

If you'd rather pin the reranker in its own process — to avoid paying the router's unload/reload cost on rerank requests — Reranker::LlamaServer takes its own endpoint: argument and can point at a separate llama-server. Otherwise pikuri stays agnostic: it just needs URLs.

Larger multi-model runtimes (Ollama, LM Studio, ...) expose OpenAI-compatible endpoints and would also work, but pikuri's "small enough to audit" ethos keeps the recommended path on llama.cpp alone.

Install

# Gemfile
gem 'pikuri-vectordb'

Depends on pikuri-core, pikuri-subagents (the Persona value type LIBRARIAN is an instance of), and listen (filesystem events for the Watcher; loaded only when a Watcher starts).

Further reading