Class: Pikuri::VectorDb::ChromaServer

Inherits:
Object
  • Object
show all
Defined in:
lib/pikuri/vector_db/chroma_server.rb

Overview

Supervisor for a self-managed Chroma docker container. Pairs with Backend::Chroma: this class owns the process (find / start / nuke-and-recreate the container, mount the volume, heartbeat-poll until ready); Backend::Chroma owns the HTTP client that talks to it. #client returns a Backend::Chroma pre-pointed at the running container.

Why split server from client

Container lifecycle and HTTP wire protocol have nothing in common — they’re separate jobs reading separate man pages. Splitting them keeps Backend::Chroma a thin Faraday client (the audit-friendly shape) and concentrates the docker-shaped complexity in one place a reader can skip when they don’t care.

Hosts that already manage Chroma elsewhere (a production deployment, a docker-compose stack, a Kubernetes service) wire port:) directly and never touch this class.

Namespace squat: pikuri-internal-*

The container is named CONTAINER_NAME (currently “pikuri-internal-chroma”) and carries the pikuri.internal=true docker label. Any container under that name is treated as fully owned by pikuri — if the existing container’s image tag doesn’t match IMAGE, the container is removed and recreated on the pinned image without ceremony. The data volume is bind-mounted from the user’s cache directory and is not nuked by this — the user’s chroma data is theirs, even when the container that runs against it gets replaced.

Same convention scales to future internal containers (rerankers, alternative vector stores) — anything starting with “pikuri-internal-” is fair game for pikuri to manage.

Subprocess seam

Docker invocations (+docker inspect+, docker run, docker start, docker rm -f) are short-lived shell-outs — capture output, check exit, act. They route through Subprocess.spawn like the rest of pikuri-* lib/; this class is not an exception to the subprocess seam, unlike pikuri-mcp‘s ClientWrapper (which owns a long-lived stdio pipe the mcp gem mediates).

Bind 127.0.0.1, not 0.0.0.0

-p 127.0.0.1:8000:8000, not the docker default -p 8000:8000. The default binds the host port to every interface, which would expose the user’s indexed corpus to anyone on the same LAN. The privacy posture from chapter 1 extends here.

Errors are loud

Docker missing, docker run exit non-zero, docker rm -f exit non-zero, healthcheck timeout — all raise RuntimeError with the offending output. Caller is internal pikuri code (a host’s Agent.new block running at boot); this is bug territory, not “tell the model and let it retry.”

Constant Summary collapse

LOGGER =
Pikuri.logger_for('VectorDb::ChromaServer')
IMAGE =

Returns pinned chroma docker image. Bumping this constant is how the codebase upgrades the chroma version. An existing container running an older image under our CONTAINER_NAME is removed and recreated when #ensure_running! runs against a bumped pin.

Returns:

  • (String)

    pinned chroma docker image. Bumping this constant is how the codebase upgrades the chroma version. An existing container running an older image under our CONTAINER_NAME is removed and recreated when #ensure_running! runs against a bumped pin.

'chromadb/chroma:1.5.9'
CONTAINER_NAME =

Returns the container name pikuri claims for its chroma supervisor. Prefix “pikuri-internal-” is the namespace pikuri squats — see class header.

Returns:

  • (String)

    the container name pikuri claims for its chroma supervisor. Prefix “pikuri-internal-” is the namespace pikuri squats — see class header.

'pikuri-internal-chroma'
LABEL =

Returns docker label set on every container this class creates. Used by future docker ps –filter “label=#{LABEL}” enumeration; not load-bearing for the #ensure_running! algorithm itself.

Returns:

  • (String)

    docker label set on every container this class creates. Used by future docker ps –filter “label=#{LABEL}” enumeration; not load-bearing for the #ensure_running! algorithm itself.

'pikuri.internal=true'
CONTAINER_PERSIST_DIR =

Returns path inside the container where chroma persists its data (chroma’s PERSIST_DIRECTORY default when WORKDIR is /chroma). The host’s #default_data_dir bind-mounts here.

Returns:

  • (String)

    path inside the container where chroma persists its data (chroma’s PERSIST_DIRECTORY default when WORKDIR is /chroma). The host’s #default_data_dir bind-mounts here.

'/chroma/chroma'
DEFAULT_HEALTHCHECK_TIMEOUT =

Returns default seconds to wait for the container’s HTTP heartbeat to start returning 200 after docker run / docker start.

Returns:

  • (Integer)

    default seconds to wait for the container’s HTTP heartbeat to start returning 200 after docker run / docker start.

30

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT, connection: nil) ⇒ ChromaServer

Parameters:

  • data_dir (String, Pathname, nil) (defaults to: nil)

    host path to bind-mount as chroma’s persist dir. nil resolves to #default_data_dir.

  • port (Integer) (defaults to: 8000)

    host port to publish.

  • healthcheck_timeout (Integer) (defaults to: DEFAULT_HEALTHCHECK_TIMEOUT)
  • connection (Faraday::Connection, nil) (defaults to: nil)

    DI hook for tests. Production callers leave it nil.



133
134
135
136
137
138
139
140
# File 'lib/pikuri/vector_db/chroma_server.rb', line 133

def initialize(data_dir: nil, port: 8000,
               healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT,
               connection: nil)
  @data_dir = Pathname.new(data_dir || default_data_dir).expand_path
  @port = port
  @healthcheck_timeout = healthcheck_timeout
  @connection = connection
end

Instance Attribute Details

#data_dirPathname (readonly)

Returns host-side data directory.

Returns:

  • (Pathname)

    host-side data directory.



143
144
145
# File 'lib/pikuri/vector_db/chroma_server.rb', line 143

def data_dir
  @data_dir
end

#portInteger (readonly)

Returns host-side port.

Returns:

  • (Integer)

    host-side port.



146
147
148
# File 'lib/pikuri/vector_db/chroma_server.rb', line 146

def port
  @port
end

Class Method Details

.ensure_running(data_dir: nil, port: 8000, healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT) ⇒ ChromaServer

Construct a server and immediately ensure it’s running. Convenience factory — equivalent to new(…).tap(&:ensure_running!).

Parameters:

  • data_dir (String, Pathname, nil) (defaults to: nil)

    host path bind-mounted into the container’s persist directory. nil resolves to #default_data_dir ($XDG_CACHE_HOME or ~/.cache, then pikuri/chroma). Created if missing.

  • port (Integer) (defaults to: 8000)

    host port bound to chroma’s 8000. Bound to 127.0.0.1 only.

  • healthcheck_timeout (Integer) (defaults to: DEFAULT_HEALTHCHECK_TIMEOUT)

    seconds to poll /api/v2/heartbeat before giving up.

Returns:



117
118
119
120
121
122
123
# File 'lib/pikuri/vector_db/chroma_server.rb', line 117

def self.ensure_running(data_dir: nil, port: 8000,
                        healthcheck_timeout: DEFAULT_HEALTHCHECK_TIMEOUT)
  new(
    data_dir: data_dir, port: port,
    healthcheck_timeout: healthcheck_timeout
  ).tap(&:ensure_running!)
end

Instance Method Details

#client(collection:) ⇒ Backend::Chroma

Build a Backend::Chroma pointing at the supervised container. Just a constructor convenience — the supervisor carries the host/port, the caller carries the collection name.

Parameters:

  • collection (String)

    Chroma collection name.

Returns:



161
162
163
# File 'lib/pikuri/vector_db/chroma_server.rb', line 161

def client(collection:)
  Backend::Chroma.new(host: 'localhost', port: @port, collection: collection)
end

#default_data_dirString

Default host-side data directory: $XDG_CACHE_HOME/pikuri/chroma if set, else ~/.cache/pikuri/chroma. Public so tests and chapter examples can reference the same path the supervisor resolves at runtime.

Returns:

  • (String)


200
201
202
203
204
# File 'lib/pikuri/vector_db/chroma_server.rb', line 200

def default_data_dir
  cache_home = ENV['XDG_CACHE_HOME']
  cache_home = File.expand_path('~/.cache') if cache_home.nil? || cache_home.empty?
  File.join(cache_home, 'pikuri', 'chroma')
end

#endpointString

Returns localhost:<port>”. Useful for wiring custom Backend::Chroma constructions.

Returns:

  • (String)

    localhost:<port>”. Useful for wiring custom Backend::Chroma constructions.



150
151
152
# File 'lib/pikuri/vector_db/chroma_server.rb', line 150

def endpoint
  "http://localhost:#{@port}"
end

#ensure_running!void

This method returns an undefined value.

Idempotent: find / start / recreate the container, then heartbeat-poll until ready. Safe to call repeatedly; a second call against an already-running healthy container is a couple of docker inspect + heartbeat round trips.

Raises:

  • (RuntimeError)

    on missing docker, any docker command failure, or healthcheck timeout.



173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/pikuri/vector_db/chroma_server.rb', line 173

def ensure_running!
  FileUtils.mkdir_p(@data_dir)

  case container_state
  when :missing
    run_container!
  when :wrong_image
    LOGGER.info("removing #{CONTAINER_NAME} (image mismatch with pin #{IMAGE})")
    remove_container!
    run_container!
  when :stopped
    LOGGER.info("starting existing #{CONTAINER_NAME}")
    start_container!
  when :running
    LOGGER.info("#{CONTAINER_NAME} already running")
  end

  wait_for_healthy!
end