Class: Tep::Llm::OpenAI::Server
- Inherits:
-
Object
- Object
- Tep::Llm::OpenAI::Server
- Defined in:
- lib/tep/openai_server.rb
Overview
The mountable server. Class methods because an app wires one backend per process at boot (‘use`) then mounts the standard routes (`serve!`).
Class Method Summary collapse
-
.serve!(events_jsonl = "") ⇒ Object
Mount the standard OpenAI routes + (optionally) start the toy/v1 events stream.
-
.use(backend) ⇒ Object
Register the app’s backend.
Class Method Details
.serve!(events_jsonl = "") ⇒ Object
Mount the standard OpenAI routes + (optionally) start the toy/v1 events stream. ‘events_jsonl` is a JSONL path the per-request inference event + the run_start at boot append to; an empty path (the default) disables emission with zero overhead. Backwards-compatible with the 7.1a/b no-arg form.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/tep/openai_server.rb', line 134 def self.serve!(events_jsonl = "") events = Tep::Events.new(events_jsonl) Tep::APP.set_openai_events(events) host = ENV["HOSTNAME"] if host.length == 0 host = "tep" end # backend.device_kind => the run_start's `backend.kind`; reads # the backend via APP.openai_backend so a `use`d subclass's # override answers (e.g. ToyBackend returning "cuda"). backend_kind = Tep::APP.openai_backend.device_kind config_json = "{" + Tep::Json.encode_pair_str("server", "tep-llm-openai") + "," + Tep::Json.encode_pair_str("events_jsonl", events_jsonl) + "}" events.run_start(host, backend_kind, "", "", config_json) Tep.get("/v1/models", Tep::Llm::OpenAI::ModelsHandler.new) Tep.post("/v1/completions", Tep::Llm::OpenAI::CompletionsHandler.new) Tep.post("/v1/chat/completions", Tep::Llm::OpenAI::ChatCompletionsHandler.new) # Always mounted; the handler 501s when supports_embeddings? # is false (same gate shape as chat completions). Tep.post("/v1/embeddings", Tep::Llm::OpenAI::EmbeddingsHandler.new) 0 end |