Class: Tep::Llm::OpenAI::ChatCompletionsHandler

Inherits:
Handler
  • Object
show all
Defined in:
lib/tep/openai_server.rb

Overview

POST /v1/chat/completions – message-level OpenAI shape. Skeleton for now: gated 501 when backend.supports_chat? is false (the default; chat templating is per-model + an ML concern tep doesn’t ship). When a backend opts in (overrides supports_chat? to true + chat_completion), this dispatches to it and formats the standard chat.completion envelope around the returned Completion (the text field becomes the assistant message’s content). Streaming chat lands later.

Instance Method Summary collapse

Methods inherited from Handler

#is_regex?, #re_capture, #re_match?

Instance Method Details

#handle(req, res) ⇒ Object



628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
# File 'lib/tep/openai_server.rb', line 628

def handle(req, res)
  res.headers["Content-Type"] = "application/json"
  if !Tep::APP.openai_backend.supports_chat?
    res.set_status(501)
    return "{" +
      "\"error\":{" +
        Tep::Json.encode_pair_str("message",
          "chat completions not supported by this backend") + "," +
        Tep::Json.encode_pair_str("type", "not_implemented") +
      "}" +
    "}"
  end
  body  = req.raw_body
  model = Tep::Json.get_str(body, "model")

  # Streaming branch (#127): same "stream":true sniff as
  # CompletionsHandler. Sends an SSE response driven by
  # ChatCompletionsStreamer -- which calls into
  # backend.chat_completion_stream via a ChatStreamSink.
  wants_stream = Tep.str_find(body, "\"stream\":true", 0) >= 0 ||
                 Tep.str_find(body, "\"stream\": true", 0) >= 0
  if wants_stream
    res.headers["Content-Type"]  = "text/event-stream"
    res.headers["Cache-Control"] = "no-cache"
    streamer = Tep::Llm::OpenAI::ChatCompletionsStreamer.new
    streamer.req_ref       = req
    streamer.model         = model
    # No `prompt` token-id array on chat requests; pass 0 so
    # the inference event has a deterministic value. A future
    # refinement can derive prompt_tokens from the messages
    # array's byte length / tokenizer estimate.
    streamer.prompt_tokens = 0
    streamer.t0            = Time.now.to_i
    streamer.request_id    = "chatcmpl-tep"
    streamer.principal_id  = req.identity.subject
    res.start_stream(streamer)
    return ""
  end

  comp  = Tep::APP.openai_backend.chat_completion(req)
  total = comp.prompt_tokens + comp.completion_tokens
  "{" +
    Tep::Json.encode_pair_str("id", "chatcmpl-tep") + "," +
    Tep::Json.encode_pair_str("object", "chat.completion") + "," +
    Tep::Json.encode_pair_int("created", Time.now.to_i) + "," +
    Tep::Json.encode_pair_str("model", model) + "," +
    "\"choices\":[{" +
      Tep::Json.encode_pair_int("index", 0) + "," +
      "\"message\":{" +
        Tep::Json.encode_pair_str("role", "assistant") + "," +
        Tep::Json.encode_pair_str("content", comp.text) +
      "}," +
      Tep::Json.encode_pair_str("finish_reason", "stop") +
    "}]," +
    "\"usage\":{" +
      Tep::Json.encode_pair_int("prompt_tokens", comp.prompt_tokens) + "," +
      Tep::Json.encode_pair_int("completion_tokens", comp.completion_tokens) + "," +
      Tep::Json.encode_pair_int("total_tokens", total) +
    "}" +
  "}"
end