Class: Tep::Llm::OpenAI::ChatCompletionsHandler

Inherits:
Handler
  • Object
show all
Defined in:
lib/tep/openai_server.rb

Overview

POST /v1/chat/completions – message-level OpenAI shape. Skeleton for now: gated 501 when backend.supports_chat? is false (the default; chat templating is per-model + an ML concern tep doesn’t ship). When a backend opts in (overrides supports_chat? to true + chat_completion), this dispatches to it and formats the standard chat.completion envelope around the returned Completion (the text field becomes the assistant message’s content). Streaming chat lands later.

Instance Method Summary collapse

Methods inherited from Handler

#is_regex?, #re_capture, #re_match?

Instance Method Details

#handle(req, res) ⇒ Object



595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
# File 'lib/tep/openai_server.rb', line 595

def handle(req, res)
  res.headers["Content-Type"] = "application/json"
  if !Tep::APP.openai_backend.supports_chat?
    res.set_status(501)
    return "{" +
      "\"error\":{" +
        Tep::Json.encode_pair_str("message",
          "chat completions not supported by this backend") + "," +
        Tep::Json.encode_pair_str("type", "not_implemented") +
      "}" +
    "}"
  end
  body  = req.raw_body
  model = Tep::Json.get_str(body, "model")

  # Streaming branch (#127): same "stream":true sniff as
  # CompletionsHandler. Sends an SSE response driven by
  # ChatCompletionsStreamer -- which calls into
  # backend.chat_completion_stream via a ChatStreamSink.
  wants_stream = Tep.str_find(body, "\"stream\":true", 0) >= 0 ||
                 Tep.str_find(body, "\"stream\": true", 0) >= 0
  if wants_stream
    res.headers["Content-Type"]  = "text/event-stream"
    res.headers["Cache-Control"] = "no-cache"
    streamer = Tep::Llm::OpenAI::ChatCompletionsStreamer.new
    streamer.req_ref       = req
    streamer.model         = model
    # No `prompt` token-id array on chat requests; pass 0 so
    # the inference event has a deterministic value. A future
    # refinement can derive prompt_tokens from the messages
    # array's byte length / tokenizer estimate.
    streamer.prompt_tokens = 0
    streamer.t0            = Time.now.to_i
    streamer.request_id    = "chatcmpl-tep"
    streamer.principal_id  = req.identity.subject
    res.start_stream(streamer)
    return ""
  end

  comp  = Tep::APP.openai_backend.chat_completion(req)
  total = comp.prompt_tokens + comp.completion_tokens
  "{" +
    Tep::Json.encode_pair_str("id", "chatcmpl-tep") + "," +
    Tep::Json.encode_pair_str("object", "chat.completion") + "," +
    Tep::Json.encode_pair_int("created", Time.now.to_i) + "," +
    Tep::Json.encode_pair_str("model", model) + "," +
    "\"choices\":[{" +
      Tep::Json.encode_pair_int("index", 0) + "," +
      "\"message\":{" +
        Tep::Json.encode_pair_str("role", "assistant") + "," +
        Tep::Json.encode_pair_str("content", comp.text) +
      "}," +
      Tep::Json.encode_pair_str("finish_reason", "stop") +
    "}]," +
    "\"usage\":{" +
      Tep::Json.encode_pair_int("prompt_tokens", comp.prompt_tokens) + "," +
      Tep::Json.encode_pair_int("completion_tokens", comp.completion_tokens) + "," +
      Tep::Json.encode_pair_int("total_tokens", total) +
    "}" +
  "}"
end