Class: Pikuri::VectorDb::ChromaServer
- Inherits:
-
Object
- Object
- Pikuri::VectorDb::ChromaServer
- Defined in:
- lib/pikuri/vector_db/chroma_server.rb
Overview
Supervisor for a self-managed Chroma docker container. Pairs with Backend::Chroma: this class owns the process (find / start / nuke-and-recreate the container, mount the volume, heartbeat-poll until ready); Backend::Chroma owns the HTTP client that talks to it. #client returns a Backend::Chroma pre-pointed at the running container.
Why split server from client
Container lifecycle and HTTP wire protocol have nothing in common — they’re separate jobs reading separate man pages. Splitting them keeps Backend::Chroma a thin Faraday client (the audit-friendly shape) and concentrates the docker-shaped complexity in one place a reader can skip when they don’t care.
Hosts that already manage Chroma elsewhere (a production deployment, a docker-compose stack, a Kubernetes service) wire port:) directly and never touch this class.
Namespace squat: pikuri-internal-*
The container is named CONTAINER_NAME (currently “pikuri-internal-chroma”) and carries the pikuri.internal=true docker label. Any container under that name is treated as fully owned by pikuri — if the existing container’s image tag doesn’t match IMAGE, the container is removed and recreated on the pinned image without ceremony. The data volume is bind-mounted from the user’s cache directory and is not nuked by this — the user’s chroma data is theirs, even when the container that runs against it gets replaced.
Same convention scales to future internal containers (rerankers, alternative vector stores) — anything starting with “pikuri-internal-” is fair game for pikuri to manage.
Subprocess seam
Docker invocations (+docker inspect+, docker run, docker start, docker rm -f) are short-lived shell-outs — capture output, check exit, act. They route through Subprocess.spawn like the rest of pikuri-* lib/; this class is not an exception to the subprocess seam, unlike pikuri-mcp‘s ClientWrapper (which owns a long-lived stdio pipe the mcp gem mediates).
Bind 127.0.0.1, not 0.0.0.0
-p 127.0.0.1:8000:8000, not the docker default -p 8000:8000. The default binds the host port to every interface, which would expose the user’s indexed corpus to anyone on the same LAN. The privacy posture from chapter 1 extends here.
Errors are loud
Docker missing, docker run exit non-zero, docker rm -f exit non-zero, healthcheck timeout — all raise RuntimeError with the offending output. Caller is internal pikuri code (a host’s Agent.new block running at boot); this is bug territory, not “tell the model and let it retry.”
Constant Summary collapse
- LOGGER =
Pikuri.logger_for('VectorDb::ChromaServer')
- IMAGE =
Returns pinned chroma docker image. Bumping this constant is how the codebase upgrades the chroma version. An existing container running an older image under our CONTAINER_NAME is removed and recreated when #ensure_running! runs against a bumped pin.
'chromadb/chroma:1.5.9'- CONTAINER_NAME =
Returns the container name pikuri claims for its chroma supervisor. Prefix “pikuri-internal-” is the namespace pikuri squats — see class header.
'pikuri-internal-chroma'- LABEL =
Returns docker label set on every container this class creates. Used by future docker ps –filter “label=#{LABEL}” enumeration; not load-bearing for the #ensure_running! algorithm itself.
'pikuri.internal=true'- CONTAINER_PERSIST_DIR =
Returns path inside the container where chroma persists its data (chroma’s
PERSIST_DIRECTORYdefault whenWORKDIRis/chroma). The host’s #default_data_dir bind-mounts here. '/chroma/chroma'- DEFAULT_HEALTHCHECK_TIMEOUT =
Returns default seconds to wait for the container’s HTTP heartbeat to start returning 200 after docker run / docker start.
30
Instance Attribute Summary collapse
-
#data_dir ⇒ Pathname
readonly
Host-side data directory.
-
#port ⇒ Integer
readonly
Host-side port.
Class Method Summary collapse
-
.ensure_running(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT) ⇒ ChromaServer
Construct a server and immediately ensure it’s running.
Instance Method Summary collapse
-
#client(collection:) ⇒ Backend::Chroma
Build a Backend::Chroma pointing at the supervised container.
-
#default_data_dir ⇒ String
Default host-side data directory: $XDG_CACHE_HOME/pikuri/chroma if set, else ~/.cache/pikuri/chroma.
-
#endpoint ⇒ String
“localhost:<port>”.
-
#ensure_running! ⇒ void
Idempotent: find / start / recreate the container, then heartbeat-poll until ready.
- #initialize(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT, connection: nil) ⇒ ChromaServer constructor
Constructor Details
#initialize(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT, connection: nil) ⇒ ChromaServer
133 134 135 136 137 138 139 140 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 133 def initialize(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT, connection: nil) @data_dir = Pathname.new(data_dir || default_data_dir). @port = port @healthcheck_timeout = healthcheck_timeout @connection = connection end |
Instance Attribute Details
#data_dir ⇒ Pathname (readonly)
Returns host-side data directory.
143 144 145 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 143 def data_dir @data_dir end |
#port ⇒ Integer (readonly)
Returns host-side port.
146 147 148 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 146 def port @port end |
Class Method Details
.ensure_running(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT) ⇒ ChromaServer
Construct a server and immediately ensure it’s running. Convenience factory — equivalent to new(…).tap(&:ensure_running!).
117 118 119 120 121 122 123 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 117 def self.ensure_running(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT) new( data_dir: data_dir, port: port, healthcheck_timeout: healthcheck_timeout ).tap(&:ensure_running!) end |
Instance Method Details
#client(collection:) ⇒ Backend::Chroma
Build a Backend::Chroma pointing at the supervised container. Just a constructor convenience — the supervisor carries the host/port, the caller carries the collection name.
161 162 163 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 161 def client(collection:) Backend::Chroma.new(host: 'localhost', port: @port, collection: collection) end |
#default_data_dir ⇒ String
Default host-side data directory: $XDG_CACHE_HOME/pikuri/chroma if set, else ~/.cache/pikuri/chroma. Public so tests and chapter examples can reference the same path the supervisor resolves at runtime.
200 201 202 203 204 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 200 def default_data_dir cache_home = ENV['XDG_CACHE_HOME'] cache_home = File.('~/.cache') if cache_home.nil? || cache_home.empty? File.join(cache_home, 'pikuri', 'chroma') end |
#endpoint ⇒ String
Returns “localhost:<port>”. Useful for wiring custom Backend::Chroma constructions.
150 151 152 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 150 def endpoint "http://localhost:#{@port}" end |
#ensure_running! ⇒ void
This method returns an undefined value.
Idempotent: find / start / recreate the container, then heartbeat-poll until ready. Safe to call repeatedly; a second call against an already-running healthy container is a couple of docker inspect + heartbeat round trips.
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/pikuri/vector_db/chroma_server.rb', line 173 def ensure_running! FileUtils.mkdir_p(@data_dir) case container_state when :missing run_container! when :wrong_image LOGGER.info("removing #{CONTAINER_NAME} (image mismatch with pin #{IMAGE})") remove_container! run_container! when :stopped LOGGER.info("starting existing #{CONTAINER_NAME}") start_container! when :running LOGGER.info("#{CONTAINER_NAME} already running") end wait_for_healthy! end |