Class: Tep::Llm::OpenAI::Server

Inherits:

Object

Object
Tep::Llm::OpenAI::Server

show all

Defined in:: lib/tep/openai_server.rb

Overview

The mountable server. Class methods because an app wires one backend per process at boot (use) then mounts the standard routes (serve!).

Class Method Summary collapse

.serve!(events_jsonl = "") ⇒ Object
Mount the standard OpenAI routes + (optionally) start the toy/v1 events stream.
.use(backend) ⇒ Object
Register the app's backend.

Class Method Details

.serve!(events_jsonl = "") ⇒ `Object`

Mount the standard OpenAI routes + (optionally) start the toy/v1 events stream. events_jsonl is a JSONL path the per-request inference event + the run_start at boot append to; an empty path (the default) disables emission with zero overhead. Backwards-compatible with the 7.1a/b no-arg form.

# File 'lib/tep/openai_server.rb', line 141

def self.serve!(events_jsonl = "")
  events = Tep::Events.new(events_jsonl)
  Tep::APP.set_openai_events(events)
  host = ENV["HOSTNAME"]
  if host.length == 0
    host = "tep"
  end
  # backend.device_kind => the run_start's `backend.kind`; reads
  # the backend via APP.openai_backend so a `use`d subclass's
  # override answers (e.g. ToyBackend returning "cuda").
  backend_kind = Tep::APP.openai_backend.device_kind
  config_json = "{" +
    SpinelKit::Json.encode_pair_str("server", "tep-llm-openai") + "," +
    SpinelKit::Json.encode_pair_str("events_jsonl", events_jsonl) +
  "}"
  events.run_start(host, backend_kind, "", "", config_json)
  Tep.get("/v1/models",            Tep::Llm::OpenAI::ModelsHandler.new)
  Tep.post("/v1/completions",      Tep::Llm::OpenAI::CompletionsHandler.new)
  Tep.post("/v1/chat/completions", Tep::Llm::OpenAI::ChatCompletionsHandler.new)
  # Always mounted; the handler 501s when supports_embeddings?
  # is false (same gate shape as chat completions).
  Tep.post("/v1/embeddings",       Tep::Llm::OpenAI::EmbeddingsHandler.new)
  0
end

.use(backend) ⇒ `Object`

Register the app's backend. Pass a concrete Backend subclass instance; it's stored on Tep::APP and dispatched per request.

# File 'lib/tep/openai_server.rb', line 131

def self.use(backend)
  Tep::APP.set_openai_backend(backend)
  0
end