Class: Tep::Llm::OpenAI::Backend
- Inherits:
-
Object
- Object
- Tep::Llm::OpenAI::Backend
- Defined in:
- lib/tep/openai_server.rb
Overview
The interface an app's backend implements. Defaults make a bare backend safe to compile + serve (empty model list, chat unsupported, cpu device). Subclasses override what they offer.
Instance Method Summary collapse
-
#chat_completion(req) ⇒ Object
Message-level (chat) generation.
-
#chat_completion_stream(req, sink) ⇒ Object
Streaming chat (#127).
-
#device_kind ⇒ Object
Backend's device, surfaced into the run_start event's backend.kind at serve! time.
-
#generate_embeddings(model, token_ids) ⇒ Object
Embedding generation for /v1/embeddings.
-
#generate_from_tokens(model, token_ids, sampling) ⇒ Object
PRIMARY shape: token-level generation (maps to /v1/completions, non-streaming).
-
#generate_stream_from_tokens(model, token_ids, sampling, sink) ⇒ Object
STREAMING shape (7.2): the per-token variant for SSE /v1/completions when the request carries "stream": true.
-
#list_models ⇒ Object
Available model names -> [String].
-
#model_owner ⇒ Object
owned_by value for each entry in the /v1/models list.
-
#supports_chat? ⇒ Boolean
Does this backend implement message-level (chat) generation? When false, /v1/chat/completions returns 501.
-
#supports_embeddings? ⇒ Boolean
Backends that can embed override this -> true (gates /v1/embeddings, chunk 7.3).
Instance Method Details
#chat_completion(req) ⇒ Object
Message-level (chat) generation. Mirrors generate_from_tokens but receives the raw req so the backend can parse the messages array itself + apply its own chat template. Tep doesn't pre-build a Message because templating + role ordering is per-model; the JSON tools live in SpinelKit::Json. The return is reused from the token path (text becomes the assistant message's content). Base no-op; subclasses override. Only reached when supports_chat? returns true -- the handler gates with a 501 otherwise.
77 78 79 |
# File 'lib/tep/openai_server.rb', line 77 def chat_completion(req) Tep::Llm::OpenAI::Completion.new end |
#chat_completion_stream(req, sink) ⇒ Object
Streaming chat (#127). Per-token variant for SSE
/v1/chat/completions when the request carries "stream":true.
Backend writes each token to sink via sink.emit_token(piece);
the sink formats it as the OpenAI chat-streaming delta frame
and writes one chunked frame. Same subclass-override-sink
pattern as 7.2 (generate_stream_from_tokens). Base no-op.
87 88 89 |
# File 'lib/tep/openai_server.rb', line 87 def chat_completion_stream(req, sink) 0 end |
#device_kind ⇒ Object
Backend's device, surfaced into the run_start event's backend.kind at serve! time. Defaults to cpu.
93 94 95 |
# File 'lib/tep/openai_server.rb', line 93 def device_kind "cpu" end |
#generate_embeddings(model, token_ids) ⇒ Object
Embedding generation for /v1/embeddings. token_ids is the
encoded input (Array; this server speaks IDs only,
tokenize client-side, same policy as generate_from_tokens).
Returns the pooled embedding as an Array of length
d_model -- the backend owns the lookup + pooling strategy
(toy mean-pools per-token embeddings). Base returns an empty
vector so a bare backend compiles; only reached when
supports_embeddings? is true (EmbeddingsHandler gates 501).
118 119 120 121 122 |
# File 'lib/tep/openai_server.rb', line 118 def (model, token_ids) empty = [0.0] empty.delete_at(0) empty end |
#generate_from_tokens(model, token_ids, sampling) ⇒ Object
PRIMARY shape: token-level generation (maps to
/v1/completions, non-streaming). token_ids is the encoded
prompt (Array); sampling is a
Tep::Llm::OpenAI::Sampling. Returns a
Tep::Llm::OpenAI::Completion (text + usage). The base returns
an empty completion so a bare backend compiles; real backends
override.
44 45 46 |
# File 'lib/tep/openai_server.rb', line 44 def generate_from_tokens(model, token_ids, sampling) Tep::Llm::OpenAI::Completion.new end |
#generate_stream_from_tokens(model, token_ids, sampling, sink) ⇒ Object
STREAMING shape (7.2): the per-token variant for SSE
/v1/completions when the request carries "stream": true.
The backend writes each token to sink via
sink.emit_token(piece); the sink (Tep::Llm::OpenAI::StreamSink)
formats it as an OpenAI SSE frame and writes to the
outbound chunked stream. Blocks/yields don't lower across the
spinel boundary, so a typed sink replaces the block --
backends never see SSE wire format or the client fd.
Base no-op (subclasses override).
57 58 59 |
# File 'lib/tep/openai_server.rb', line 57 def generate_stream_from_tokens(model, token_ids, sampling, sink) 0 end |
#list_models ⇒ Object
Available model names -> [String]. /v1/models wraps these.
31 32 33 34 35 |
# File 'lib/tep/openai_server.rb', line 31 def list_models empty = [""] empty.delete_at(0) empty end |
#model_owner ⇒ Object
owned_by value for each entry in the /v1/models list. Defaults to "tep"; a backend overrides to attribute models to its own project (e.g. toy returns "toy").
100 101 102 |
# File 'lib/tep/openai_server.rb', line 100 def model_owner "tep" end |
#supports_chat? ⇒ Boolean
Does this backend implement message-level (chat) generation? When false, /v1/chat/completions returns 501. (The chat template is per-model + an ML concern; tep doesn't ship one.)
64 65 66 |
# File 'lib/tep/openai_server.rb', line 64 def supports_chat? false end |
#supports_embeddings? ⇒ Boolean
Backends that can embed override this -> true (gates /v1/embeddings, chunk 7.3).
106 107 108 |
# File 'lib/tep/openai_server.rb', line 106 def false end |