Class: Google::Cloud::GkeRecommender::V1::PerformanceStats

Inherits:
Object
  • Object
show all
Extended by:
Protobuf::MessageExts::ClassMethods
Includes:
Protobuf::MessageExts
Defined in:
proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb

Overview

Performance statistics for a model deployment.

Instance Attribute Summary collapse

Instance Attribute Details

#cost::Array<::Google::Cloud::GkeRecommender::V1::Cost> (readonly)

Returns Output only. The cost of running the model deployment.

Returns:



439
440
441
442
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#ntpot_milliseconds::Integer (readonly)

Returns Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Returns:

  • (::Integer)

    Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.



439
440
441
442
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#output_tokens_per_second::Integer (readonly)

Returns Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Returns:

  • (::Integer)

    Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.



439
440
441
442
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#queries_per_second::Float (readonly)

Returns Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Returns:

  • (::Float)

    Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.



439
440
441
442
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#ttft_milliseconds::Integer (readonly)

Returns Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Returns:

  • (::Integer)

    Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.



439
440
441
442
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end