Class: Google::Cloud::GkeRecommender::V1::PerformanceStats
- Inherits:
-
Object
- Object
- Google::Cloud::GkeRecommender::V1::PerformanceStats
- Extended by:
- Protobuf::MessageExts::ClassMethods
- Includes:
- Protobuf::MessageExts
- Defined in:
- proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb
Overview
Performance statistics for a model deployment.
Instance Attribute Summary collapse
-
#cost ⇒ ::Array<::Google::Cloud::GkeRecommender::V1::Cost>
readonly
Output only.
-
#ntpot_milliseconds ⇒ ::Integer
readonly
Output only.
-
#output_tokens_per_second ⇒ ::Integer
readonly
Output only.
-
#queries_per_second ⇒ ::Float
readonly
Output only.
-
#ttft_milliseconds ⇒ ::Integer
readonly
Output only.
Instance Attribute Details
#cost ⇒ ::Array<::Google::Cloud::GkeRecommender::V1::Cost> (readonly)
Returns Output only. The cost of running the model deployment.
439 440 441 442 |
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439 class PerformanceStats include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end |
#ntpot_milliseconds ⇒ ::Integer (readonly)
Returns Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
439 440 441 442 |
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439 class PerformanceStats include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end |
#output_tokens_per_second ⇒ ::Integer (readonly)
Returns Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
439 440 441 442 |
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439 class PerformanceStats include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end |
#queries_per_second ⇒ ::Float (readonly)
Returns Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
439 440 441 442 |
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439 class PerformanceStats include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end |
#ttft_milliseconds ⇒ ::Integer (readonly)
Returns Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
439 440 441 442 |
# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439 class PerformanceStats include ::Google::Protobuf::MessageExts extend ::Google::Protobuf::MessageExts::ClassMethods end |