Class: Google::Cloud::GkeRecommender::V1::PerformanceStats

Inherits:

Object

Object
Google::Cloud::GkeRecommender::V1::PerformanceStats

show all

Extended by:: Protobuf::MessageExts::ClassMethods

Includes:: Protobuf::MessageExts

Defined in:: proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb

Overview

Performance statistics for a model deployment.

Instance Attribute Summary collapse

#cost ⇒ ::Array<::Google::Cloud::GkeRecommender::V1::Cost> readonly
Output only.
#ntpot_milliseconds ⇒ ::Integer readonly
Output only.
#output_tokens_per_second ⇒ ::Integer readonly
Output only.
#queries_per_second ⇒ ::Float readonly
Output only.
#ttft_milliseconds ⇒ ::Integer readonly
Output only.

Instance Attribute Details

#cost ⇒ `::Array<::Google::Cloud::GkeRecommender::V1::Cost>` (readonly)

Returns Output only. The cost of running the model deployment.

Returns:

(::Array<::Google::Cloud::GkeRecommender::V1::Cost>) —
Output only. The cost of running the model deployment.

# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#ntpot_milliseconds ⇒ `::Integer` (readonly)

Returns Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Returns:

(::Integer) —
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#output_tokens_per_second ⇒ `::Integer` (readonly)

Returns Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Returns:

(::Integer) —
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#queries_per_second ⇒ `::Float` (readonly)

Returns Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Returns:

(::Float) —
Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

#ttft_milliseconds ⇒ `::Integer` (readonly)

Returns Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Returns:

(::Integer) —
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

# File 'proto_docs/google/cloud/gkerecommender/v1/gkerecommender.rb', line 439

class PerformanceStats
  include ::Google::Protobuf::MessageExts
  extend ::Google::Protobuf::MessageExts::ClassMethods
end

Class: Google::Cloud::GkeRecommender::V1::PerformanceStats

Overview

Instance Attribute Summary collapse

Instance Attribute Details

#cost ⇒ ::Array<::Google::Cloud::GkeRecommender::V1::Cost> (readonly)

#ntpot_milliseconds ⇒ ::Integer (readonly)

#output_tokens_per_second ⇒ ::Integer (readonly)

#queries_per_second ⇒ ::Float (readonly)

#ttft_milliseconds ⇒ ::Integer (readonly)

#cost ⇒ `::Array<::Google::Cloud::GkeRecommender::V1::Cost>` (readonly)

#ntpot_milliseconds ⇒ `::Integer` (readonly)

#output_tokens_per_second ⇒ `::Integer` (readonly)

#queries_per_second ⇒ `::Float` (readonly)

#ttft_milliseconds ⇒ `::Integer` (readonly)