pikuri-vectordb
Local-corpus vector search + agentic RAG for the pikuri AI-assistant toolkit: semantic recall over a pile of files you point it at — your notes, your docs, your contracts — where the agent decides when to retrieve, same Thought → Tool-call → Observation loop as every other tool.
Wire it onto a pikuri-core agent the same way as pikuri-tasks /
pikuri-memory — c.add_extension inside the Agent.new block:
require 'pikuri-vectordb'
Pikuri::Agent.new(transport: ..., system_prompt: ...) do |c|
c.add_extension Pikuri::VectorDb::Extension.new(
backend: Pikuri::VectorDb::Backend::InMemory.new,
source: '~/notes'
)
end
What you get
Three tools, registered by the extension:
vectordb_search— embeds the query, pulls the top-k nearest chunks from the backend, optionally reranks them with a cross-encoder, and hands the agent a numbered list ofsource (score=…)snippets as its next observation.vectordb_read— parent-document retrieval: when a search surfaces a clean hit, the agent reads that whole document by itssourcepath instead of re-querying for more fragments of it.vectordb_reindex— rebuilds the index from the source, on request.
The extension registers the tools and nothing else — populating the index is the host's call, never something done behind your back. Three equally valid shapes:
- Index at boot:
extension.indexer.index_if_empty!. - Keep it live: run a
Pikuri::VectorDb::Watcheraroundextension.indexer— a filesystem-event daemon (thelistengem) that sweeps once on boot and reindexes files as they change. - Leave it empty and let the user drive: the agent calls
vectordb_reindexwhen asked.
Backends
Three implementations of one duck-typed interface
(#upsert / #query / #delete_all / #count) — swapping is a
one-line change:
Backend::InMemory— the educational default. Pure-Ruby cosine overArray<Float>, ~40 lines, reads in one sitting. RAM-only: everything reloads from sources on every boot.Backend::Qdrant— thin Faraday HTTP client against a self-hosted Qdrant. The recommended persistent backend —DESIGN.mdhas the engine survey behind the pick.Backend::Chroma— the supported ChromaDB alternative, identical wiring.
Each persistent engine pairs with a Server::* supervisor that
runs it as a self-managed docker container: pinned image, a
container name pikuri owns (pikuri-internal-qdrant /
pikuri-internal-chroma), data bind-mounted under
~/.cache/pikuri/ so the corpus survives container recreation.
# Supervised container (needs docker on PATH):
backend = Pikuri::VectorDb::Server::Qdrant.ensure_running.client(
collection: 'my-docs'
)
# Or point at a Qdrant you already run:
backend = Pikuri::VectorDb::Backend::Qdrant.new(
host: 'localhost', port: 6333, collection: 'my-docs'
)
Collection naming is engine-specific so it lives on the backend
constructor, not on the Extension — Backend::InMemory has no
collection concept.
The indexing pipeline
What vectordb_reindex (and the Watcher) actually runs, piece by
piece — each swappable via the Extension's keyword arguments:
- Chunker (
Chunker::FixedWindow) — overlapping windows, default 512 tokens with 50 of overlap, so an answer straddling a boundary survives in at least one chunk. - Tokenizer (
Tokenizer::CharHeuristicdefault /Tokenizer::LlamaServer) — counts tokens for the chunker; the heuristic is the offline ~4-chars-per-token rule, theLlamaServervariant asks the embedder's/tokenizeendpoint for an exact count. - Embedder — thin wrapper over
RubyLLM.embed; tests inject a fake#embedwithout monkey-patching ruby_llm. - Reranker (
Reranker::LlamaServer, optional) — cross-encoder overPOST /v1/rerank. Passreranker: nilto skip it; retrieval falls back to vector-only top-k — less precision, same correctness.
Text extraction reuses Pikuri::FileType.read_as_text from
pikuri-core — plain text / Markdown / PDF. HTML extraction is a
deferred follow-up.
Demo: pikuri-corpus
From a source checkout (not installed by gem install):
./pikuri-vectordb/bin/pikuri-corpus --qdrant --watch
A single recall agent over docs/guide/ (the pikuri guide itself)
with no egress — its tools are the three above plus
calculator; no web search, no fetch, no bash. The corpus stands
in for private data, and an agent that can read it must not also be
able to send it out. --qdrant / --chroma persist the index
across runs, --watch keeps it live, --no-reranker drops the
reranker requirement. The guide's
chapter 3 is the full walkthrough.
The LIBRARIAN persona
For hosts that want recall behind a privilege-separated sub-agent —
the right shape once the parent agent has egress (see
SECURITY.md at the repo root) — the bundled
Pikuri::VectorDb::LIBRARIAN persona is opt-in via
pikuri-subagents:
require 'pikuri-subagents'
c.add_extension(
Pikuri::SubAgent::Extension.new(
personas: [Pikuri::VectorDb::LIBRARIAN]
)
)
Three model endpoints
A full setup wants three LLM endpoints: chat (via ruby_llm), an
embedder (via RubyLLM.embed), and an optional reranker (HTTP
/v1/rerank). Recommended setup: one llama-server running in
router mode — started with no --model flag, it serves every
GGUF in ~/.cache/llama.cpp/ from a single port and loads
whichever model each request asks for. Requires a recent enough
llama.cpp build to include the
model-management feature;
Ubuntu 26.04+ packages one. The guide's
chapter 1 walks through the setup;
chapter 3 adds the embedder and
reranker on top.
If you'd rather pin the reranker in its own process — to avoid
paying the router's unload/reload cost on rerank requests —
Reranker::LlamaServer takes its own endpoint: argument and
can point at a separate llama-server. Otherwise pikuri stays
agnostic: it just needs URLs.
Larger multi-model runtimes (Ollama, LM Studio, ...) expose
OpenAI-compatible endpoints and would also work, but pikuri's
"small enough to audit" ethos keeps the recommended path on
llama.cpp alone.
Install
# Gemfile
gem 'pikuri-vectordb'
Depends on pikuri-core, pikuri-subagents (the Persona value
type LIBRARIAN is an instance of), and listen (filesystem
events for the Watcher; loaded only when a Watcher starts).
Further reading
- Guide chapter: Agentic search and the vector
DB — concepts, model setup, the
no-egress argument,
--qdrant --watchday-to-day shape. - Design notes:
DESIGN.md— the Chroma-vs-Qdrant engine survey. - API reference: browse the YARD docs at
https://rubydoc.info/gems/pikuri-vectordb (once published),
or run
bundle exec yardin this directory for a local copy.