Class: Tep::Llm::OpenAI::CompletionsStreamer
- Defined in:
- lib/tep/openai_server.rb
Overview
Runs one streaming completion. Subclass of Tep::Streamer so the server pumps ‘pump(out)` cooperatively; we own the SSE shape end-to-end: drive the backend through StreamSink, write the terminating data:, then emit the toy/v1 serving event (kind:eval, phase:serve, name:request) via Events#inference.
Instance Attribute Summary collapse
-
#model ⇒ Object
Returns the value of attribute model.
-
#principal_id ⇒ Object
Returns the value of attribute principal_id.
-
#prompt_tokens ⇒ Object
Returns the value of attribute prompt_tokens.
-
#request_id ⇒ Object
Returns the value of attribute request_id.
-
#sampling ⇒ Object
Returns the value of attribute sampling.
-
#t0 ⇒ Object
Returns the value of attribute t0.
-
#token_ids ⇒ Object
Returns the value of attribute token_ids.
Instance Method Summary collapse
-
#initialize ⇒ CompletionsStreamer
constructor
A new instance of CompletionsStreamer.
- #pump(out) ⇒ Object
Constructor Details
#initialize ⇒ CompletionsStreamer
Returns a new instance of CompletionsStreamer.
319 320 321 322 323 324 325 326 327 328 |
# File 'lib/tep/openai_server.rb', line 319 def initialize @model = "" @token_ids = [0] @token_ids.delete_at(0) @sampling = Tep::Llm::OpenAI::Sampling.new @prompt_tokens = 0 @t0 = 0 @request_id = "" @principal_id = "" end |
Instance Attribute Details
#model ⇒ Object
Returns the value of attribute model.
316 317 318 |
# File 'lib/tep/openai_server.rb', line 316 def model @model end |
#principal_id ⇒ Object
Returns the value of attribute principal_id.
317 318 319 |
# File 'lib/tep/openai_server.rb', line 317 def principal_id @principal_id end |
#prompt_tokens ⇒ Object
Returns the value of attribute prompt_tokens.
317 318 319 |
# File 'lib/tep/openai_server.rb', line 317 def prompt_tokens @prompt_tokens end |
#request_id ⇒ Object
Returns the value of attribute request_id.
317 318 319 |
# File 'lib/tep/openai_server.rb', line 317 def request_id @request_id end |
#sampling ⇒ Object
Returns the value of attribute sampling.
316 317 318 |
# File 'lib/tep/openai_server.rb', line 316 def sampling @sampling end |
#t0 ⇒ Object
Returns the value of attribute t0.
317 318 319 |
# File 'lib/tep/openai_server.rb', line 317 def t0 @t0 end |
#token_ids ⇒ Object
Returns the value of attribute token_ids.
316 317 318 |
# File 'lib/tep/openai_server.rb', line 316 def token_ids @token_ids end |
Instance Method Details
#pump(out) ⇒ Object
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
# File 'lib/tep/openai_server.rb', line 330 def pump(out) sink = Tep::Llm::OpenAI::StreamSink.new sink.out = out sink.model = @model Tep::APP.openai_backend.generate_stream_from_tokens( @model, @token_ids, @sampling, sink) # Terminating sentinel + inference event. wall_us is # second-resolution for the same reason as the non-streaming # path (spinel Time.now exposes epoch-int only); LLM is # seconds-scale, populated wall_us is enough signal. out.write("data: [DONE]\n\n") wall_us = (Time.now.to_i - @t0) * 1_000_000 extra = "{" + Tep::Json.encode_pair_str("request_id", @request_id) + "," + Tep::Json.encode_pair_str("principal_id", @principal_id) + "}" Tep::APP.openai_events.inference( @model, @prompt_tokens, sink.completion_count, wall_us, extra) 0 end |