Class: Tep::Llm::OpenAI::CompletionsStreamer

Inherits:
Streamer
  • Object
show all
Defined in:
lib/tep/openai_server.rb

Overview

Runs one streaming completion. Subclass of Tep::Streamer so the server pumps ‘pump(out)` cooperatively; we own the SSE shape end-to-end: drive the backend through StreamSink, write the terminating data:, then emit the toy/v1 serving event (kind:eval, phase:serve, name:request) via Events#inference.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeCompletionsStreamer

Returns a new instance of CompletionsStreamer.



340
341
342
343
344
345
346
347
348
349
# File 'lib/tep/openai_server.rb', line 340

def initialize
  @model         = ""
  @token_ids     = [0]
  @token_ids.delete_at(0)
  @sampling      = Tep::Llm::OpenAI::Sampling.new
  @prompt_tokens = 0
  @t0            = 0
  @request_id    = ""
  @principal_id  = ""
end

Instance Attribute Details

#modelObject

Returns the value of attribute model.



337
338
339
# File 'lib/tep/openai_server.rb', line 337

def model
  @model
end

#principal_idObject

Returns the value of attribute principal_id.



338
339
340
# File 'lib/tep/openai_server.rb', line 338

def principal_id
  @principal_id
end

#prompt_tokensObject

Returns the value of attribute prompt_tokens.



338
339
340
# File 'lib/tep/openai_server.rb', line 338

def prompt_tokens
  @prompt_tokens
end

#request_idObject

Returns the value of attribute request_id.



338
339
340
# File 'lib/tep/openai_server.rb', line 338

def request_id
  @request_id
end

#samplingObject

Returns the value of attribute sampling.



337
338
339
# File 'lib/tep/openai_server.rb', line 337

def sampling
  @sampling
end

#t0Object

Returns the value of attribute t0.



338
339
340
# File 'lib/tep/openai_server.rb', line 338

def t0
  @t0
end

#token_idsObject

Returns the value of attribute token_ids.



337
338
339
# File 'lib/tep/openai_server.rb', line 337

def token_ids
  @token_ids
end

Instance Method Details

#pump(out) ⇒ Object



351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'lib/tep/openai_server.rb', line 351

def pump(out)
  sink = Tep::Llm::OpenAI::StreamSink.new
  sink.out   = out
  sink.model = @model
  Tep::APP.openai_backend.generate_stream_from_tokens(
    @model, @token_ids, @sampling, sink)
  # Terminating sentinel + inference event. wall_us is
  # second-resolution for the same reason as the non-streaming
  # path (spinel Time.now exposes epoch-int only); LLM is
  # seconds-scale, populated wall_us is enough signal.
  out.write("data: [DONE]\n\n")
  wall_us = (Time.now.to_i - @t0) * 1_000_000
  extra = "{" +
    Tep::Json.encode_pair_str("request_id", @request_id) + "," +
    Tep::Json.encode_pair_str("principal_id", @principal_id) +
  "}"
  Tep::APP.openai_events.inference(
    @model, @prompt_tokens, sink.completion_count, wall_us, extra)
  0
end