Class: Tep::Llm::OpenAI::ChatCompletionsHandler

Inherits:

Handler

Object
Handler
Tep::Llm::OpenAI::ChatCompletionsHandler

show all

Defined in:: lib/tep/openai_server.rb

Overview

POST /v1/chat/completions – message-level OpenAI shape. Skeleton for now: gated 501 when backend.supports_chat? is false (the default; chat templating is per-model + an ML concern tep doesn’t ship). When a backend opts in (overrides supports_chat? to true + chat_completion), this dispatches to it and formats the standard chat.completion envelope around the returned Completion (the text field becomes the assistant message’s content). Streaming chat lands later.

Instance Method Summary collapse

#handle(req, res) ⇒ Object

Methods inherited from Handler

#is_regex?, #re_capture, #re_match?

Instance Method Details

#handle(req, res) ⇒ `Object`

# File 'lib/tep/openai_server.rb', line 628

def handle(req, res)
  res.headers["Content-Type"] = "application/json"
  if !Tep::APP.openai_backend.supports_chat?
    res.set_status(501)
    return "{" +
      "\"error\":{" +
        Tep::Json.encode_pair_str("message",
          "chat completions not supported by this backend") + "," +
        Tep::Json.encode_pair_str("type", "not_implemented") +
      "}" +
    "}"
  end
  body  = req.raw_body
  model = Tep::Json.get_str(body, "model")

  # Streaming branch (#127): same "stream":true sniff as
  # CompletionsHandler. Sends an SSE response driven by
  # ChatCompletionsStreamer -- which calls into
  # backend.chat_completion_stream via a ChatStreamSink.
  wants_stream = Tep.str_find(body, "\"stream\":true", 0) >= 0 ||
                 Tep.str_find(body, "\"stream\": true", 0) >= 0
  if wants_stream
    res.headers["Content-Type"]  = "text/event-stream"
    res.headers["Cache-Control"] = "no-cache"
    streamer = Tep::Llm::OpenAI::ChatCompletionsStreamer.new
    streamer.req_ref       = req
    streamer.model         = model
    # No `prompt` token-id array on chat requests; pass 0 so
    # the inference event has a deterministic value. A future
    # refinement can derive prompt_tokens from the messages
    # array's byte length / tokenizer estimate.
    streamer.prompt_tokens = 0
    streamer.t0            = Time.now.to_i
    streamer.request_id    = "chatcmpl-tep"
    streamer.principal_id  = req.identity.subject
    res.start_stream(streamer)
    return ""
  end

  comp  = Tep::APP.openai_backend.chat_completion(req)
  total = comp.prompt_tokens + comp.completion_tokens
  "{" +
    Tep::Json.encode_pair_str("id", "chatcmpl-tep") + "," +
    Tep::Json.encode_pair_str("object", "chat.completion") + "," +
    Tep::Json.encode_pair_int("created", Time.now.to_i) + "," +
    Tep::Json.encode_pair_str("model", model) + "," +
    "\"choices\":[{" +
      Tep::Json.encode_pair_int("index", 0) + "," +
      "\"message\":{" +
        Tep::Json.encode_pair_str("role", "assistant") + "," +
        Tep::Json.encode_pair_str("content", comp.text) +
      "}," +
      Tep::Json.encode_pair_str("finish_reason", "stop") +
    "}]," +
    "\"usage\":{" +
      Tep::Json.encode_pair_int("prompt_tokens", comp.prompt_tokens) + "," +
      Tep::Json.encode_pair_int("completion_tokens", comp.completion_tokens) + "," +
      Tep::Json.encode_pair_int("total_tokens", total) +
    "}" +
  "}"
end