Class: Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client

Inherits:

Object

Object
Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client

show all

Defined in:: lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb

Overview

REST client for the GkeInferenceQuickstart service.

GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.

Defined Under Namespace

Classes: Configuration

Class Method Summary collapse

.configure {|config| ... } ⇒ Client::Configuration
Configure the GkeInferenceQuickstart Client class.

Instance Method Summary collapse

#configure {|config| ... } ⇒ Client::Configuration
Configure the GkeInferenceQuickstart Client instance.
#fetch_benchmarking_data(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
Fetches all of the benchmarking data available for a profile.
#fetch_model_server_versions(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
Fetches available model server versions.
#fetch_model_servers(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
Fetches available model servers.
#fetch_models(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
Fetches available models.
#fetch_profiles(request, options = nil) {|result, operation| ... } ⇒ ::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
Fetches available profiles.
#generate_optimized_manifest(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations.
#initialize {|config| ... } ⇒ Client constructor
Create a new GkeInferenceQuickstart REST client object.
#logger ⇒ Logger
The logger used for request/response debug logging.
#universe_domain ⇒ String
The effective universe domain.

Constructor Details

#initialize {|config| ... } ⇒ `Client`

Create a new GkeInferenceQuickstart REST client object.

Examples:


# Create a client using the default configuration
client = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a client using a custom configuration
client = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new do |config|
  config.timeout = 10.0
end

Yields:

(config) —
Configure the GkeInferenceQuickstart client.

Yield Parameters:

config (Client::Configuration)

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 139

def initialize
  # Create the configuration object
  @config = Configuration.new Client.configure

  # Yield the configuration if needed
  yield @config if block_given?

  # Create credentials
  credentials = @config.credentials
  # Use self-signed JWT if the endpoint is unchanged from default,
  # but only if the default endpoint does not have a region prefix.
  enable_self_signed_jwt = @config.endpoint.nil? ||
                           (@config.endpoint == Configuration::DEFAULT_ENDPOINT &&
                           !@config.endpoint.split(".").first.include?("-"))
  credentials ||= Credentials.default scope: @config.scope,
                                      enable_self_signed_jwt: enable_self_signed_jwt
  if credentials.is_a?(::String) || credentials.is_a?(::Hash)
    credentials = Credentials.new credentials, scope: @config.scope
  end

  @quota_project_id = @config.quota_project
  @quota_project_id ||= credentials.quota_project_id if credentials.respond_to? :quota_project_id

  @gke_inference_quickstart_stub = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::ServiceStub.new(
    endpoint: @config.endpoint,
    endpoint_template: DEFAULT_ENDPOINT_TEMPLATE,
    universe_domain: @config.universe_domain,
    credentials: credentials,
    logger: @config.logger
  )

  @gke_inference_quickstart_stub.logger(stub: true)&.info do |entry|
    entry.set_system_name
    entry.set_service
    entry.message = "Created client for #{entry.service}"
    entry.set_credentials_fields credentials
    entry.set "customEndpoint", @config.endpoint if @config.endpoint
    entry.set "defaultTimeout", @config.timeout if @config.timeout
    entry.set "quotaProject", @quota_project_id if @quota_project_id
  end
end

Class Method Details

.configure {|config| ... } ⇒ `Client::Configuration`

Configure the GkeInferenceQuickstart Client class.

See Configuration for a description of the configuration fields.

Examples:


# Modify the configuration for all GkeInferenceQuickstart clients
::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.configure do |config|
  config.timeout = 10.0
end

Yields:

(config) —
Configure the Client client.

Yield Parameters:

config (Client::Configuration)

Returns:

(Client::Configuration)

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 65

def self.configure
  @configure ||= begin
    namespace = ["Google", "Cloud", "GkeRecommender", "V1"]
    parent_config = while namespace.any?
                      parent_name = namespace.join "::"
                      parent_const = const_get parent_name
                      break parent_const.configure if parent_const.respond_to? :configure
                      namespace.pop
                    end
    default_config = Client::Configuration.new parent_config

    default_config.rpcs.fetch_models.timeout = 60.0

    default_config.rpcs.fetch_model_servers.timeout = 60.0

    default_config.rpcs.fetch_model_server_versions.timeout = 60.0

    default_config.rpcs.fetch_profiles.timeout = 60.0

    default_config.rpcs.generate_optimized_manifest.timeout = 60.0

    default_config.rpcs.fetch_benchmarking_data.timeout = 60.0

    default_config
  end
  yield @configure if block_given?
  @configure
end

Instance Method Details

#configure {|config| ... } ⇒ `Client::Configuration`

Configure the GkeInferenceQuickstart Client instance.

The configuration is set to the derived mode, meaning that values can be changed, but structural changes (adding new fields, etc.) are not allowed. Structural changes should be made on configure.

See Configuration for a description of the configuration fields.

Yields:

(config) —
Configure the Client client.

Yield Parameters:

config (Client::Configuration)

Returns:

(Client::Configuration)

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 109

def configure
  yield @config if block_given?
  @config
end

#fetch_benchmarking_data(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse` #fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse`

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Examples:

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest.new

# Call the fetch_benchmarking_data method.
result = client.fetch_benchmarking_data request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse.
p result

Overloads:

#fetch_benchmarking_data(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse

Pass arguments to fetch_benchmarking_data via a request object, either of type FetchBenchmarkingDataRequest or an equivalent Hash.
Parameters:
- request (::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest, ::Hash) —
  A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) (defaults to: nil) —
  Overrides the default settings for this call, e.g, timeout, retries etc. Optional.
#fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse

Pass arguments to fetch_benchmarking_data via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters:
- model_server_info (::Google::Cloud::GkeRecommender::V1::ModelServerInfo, ::Hash) (defaults to: nil) —
  Required. The model server configuration to get benchmarking data for. Use GkeInferenceQuickstart.FetchProfiles to find valid configurations.
- instance_type (::String) (defaults to: nil) —
  Optional. The instance type to filter benchmarking data. Instance types are in the format a2-highgpu-1g. If not provided, all instance types for the given profile's model_server_info will be returned. Use GkeInferenceQuickstart.FetchProfiles to find available instance types.
- pricing_model (::String) (defaults to: nil) —
  Optional. The pricing model to use for the benchmarking data. Defaults to spot.

Yields:

(result, operation) —
Access the result along with the TransportOperation object

Yield Parameters:

result (::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse)
operation (::Gapic::Rest::TransportOperation)

Returns:

(::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse)

Raises:

(::Google::Cloud::Error) —
if the REST call is aborted.

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 771

def fetch_benchmarking_data request, options = nil
  raise ::ArgumentError, "request must be provided" if request.nil?

  request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest

  # Converts hash and nil to an options object
  options = ::Gapic::CallOptions.new(**options.to_h) if options.respond_to? :to_h

  # Customize the options with defaults
  call_metadata = @config.rpcs.fetch_benchmarking_data.metadata.to_h

  # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers
  call_metadata[:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \
    lib_name: @config.lib_name, lib_version: @config.lib_version,
    gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION,
    transports_version_send: [:rest]

  call_metadata[:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty?
  call_metadata[:"x-goog-user-project"] = @quota_project_id if @quota_project_id

  options.apply_defaults timeout:      @config.rpcs.fetch_benchmarking_data.timeout,
                         metadata:     call_metadata,
                         retry_policy: @config.rpcs.fetch_benchmarking_data.retry_policy

  options.apply_defaults timeout:      @config.timeout,
                         metadata:     @config.metadata,
                         retry_policy: @config.retry_policy

  @gke_inference_quickstart_stub.fetch_benchmarking_data request, options do |result, operation|
    yield result, operation if block_given?
  end
rescue ::Gapic::Rest::Error => e
  raise ::Google::Cloud::Error.from_error(e)
end

#fetch_model_server_versions(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse` #fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse`

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Examples:

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest.new

# Call the fetch_model_server_versions method.
result = client.fetch_model_server_versions request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse.
p result

Overloads:

#fetch_model_server_versions(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse

Pass arguments to fetch_model_server_versions via a request object, either of type FetchModelServerVersionsRequest or an equivalent Hash.
Parameters:
- request (::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest, ::Hash) —
  A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) (defaults to: nil) —
  Overrides the default settings for this call, e.g, timeout, retries etc. Optional.
#fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse

Pass arguments to fetch_model_server_versions via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters:
- model (::String) (defaults to: nil) —
  Required. The model for which to list model server versions. Open-source models follow the Huggingface Hub owner/model_name format. Use GkeInferenceQuickstart.FetchModels to find available models.
- model_server (::String) (defaults to: nil) —
  Required. The model server for which to list versions. Open-source model servers use simplified, lowercase names (e.g., vllm). Use GkeInferenceQuickstart.FetchModelServers to find available model servers.
- page_size (::Integer) (defaults to: nil) —
  Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
- page_token (::String) (defaults to: nil) —
  Optional. The value of next_page_token received from a previous FetchModelServerVersionsRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchModelServerVersionsRequest must match the call that provided the page token.

Yields:

(result, operation) —
Access the result along with the TransportOperation object

Yield Parameters:

result (::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse)
operation (::Gapic::Rest::TransportOperation)

Returns:

(::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse)

Raises:

(::Google::Cloud::Error) —
if the REST call is aborted.

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 452

def fetch_model_server_versions request, options = nil
  raise ::ArgumentError, "request must be provided" if request.nil?

  request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest

  # Converts hash and nil to an options object
  options = ::Gapic::CallOptions.new(**options.to_h) if options.respond_to? :to_h

  # Customize the options with defaults
  call_metadata = @config.rpcs.fetch_model_server_versions.metadata.to_h

  # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers
  call_metadata[:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \
    lib_name: @config.lib_name, lib_version: @config.lib_version,
    gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION,
    transports_version_send: [:rest]

  call_metadata[:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty?
  call_metadata[:"x-goog-user-project"] = @quota_project_id if @quota_project_id

  options.apply_defaults timeout:      @config.rpcs.fetch_model_server_versions.timeout,
                         metadata:     call_metadata,
                         retry_policy: @config.rpcs.fetch_model_server_versions.retry_policy

  options.apply_defaults timeout:      @config.timeout,
                         metadata:     @config.metadata,
                         retry_policy: @config.retry_policy

  @gke_inference_quickstart_stub.fetch_model_server_versions request, options do |result, operation|
    yield result, operation if block_given?
  end
rescue ::Gapic::Rest::Error => e
  raise ::Google::Cloud::Error.from_error(e)
end

#fetch_model_servers(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse` #fetch_model_servers(model: nil, page_size: nil, page_token: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse`

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Examples:

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchModelServersRequest.new

# Call the fetch_model_servers method.
result = client.fetch_model_servers request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelServersResponse.
p result

Overloads:

#fetch_model_servers(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse

Pass arguments to fetch_model_servers via a request object, either of type FetchModelServersRequest or an equivalent Hash.
Parameters:
- request (::Google::Cloud::GkeRecommender::V1::FetchModelServersRequest, ::Hash) —
  A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) (defaults to: nil) —
  Overrides the default settings for this call, e.g, timeout, retries etc. Optional.
#fetch_model_servers(model: nil, page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse

Pass arguments to fetch_model_servers via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters:
- model (::String) (defaults to: nil) —
  Required. The model for which to list model servers. Open-source models follow the Huggingface Hub owner/model_name format. Use GkeInferenceQuickstart.FetchModels to find available models.
- page_size (::Integer) (defaults to: nil) —
  Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
- page_token (::String) (defaults to: nil) —
  Optional. The value of next_page_token received from a previous FetchModelServersRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchModelServersRequest must match the call that provided the page token.

Yields:

(result, operation) —
Access the result along with the TransportOperation object

Yield Parameters:

result (::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse)
operation (::Gapic::Rest::TransportOperation)

Returns:

(::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse)

Raises:

(::Google::Cloud::Error) —
if the REST call is aborted.

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 345

def fetch_model_servers request, options = nil
  raise ::ArgumentError, "request must be provided" if request.nil?

  request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchModelServersRequest

  # Converts hash and nil to an options object
  options = ::Gapic::CallOptions.new(**options.to_h) if options.respond_to? :to_h

  # Customize the options with defaults
  call_metadata = @config.rpcs.fetch_model_servers.metadata.to_h

  # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers
  call_metadata[:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \
    lib_name: @config.lib_name, lib_version: @config.lib_version,
    gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION,
    transports_version_send: [:rest]

  call_metadata[:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty?
  call_metadata[:"x-goog-user-project"] = @quota_project_id if @quota_project_id

  options.apply_defaults timeout:      @config.rpcs.fetch_model_servers.timeout,
                         metadata:     call_metadata,
                         retry_policy: @config.rpcs.fetch_model_servers.retry_policy

  options.apply_defaults timeout:      @config.timeout,
                         metadata:     @config.metadata,
                         retry_policy: @config.retry_policy

  @gke_inference_quickstart_stub.fetch_model_servers request, options do |result, operation|
    yield result, operation if block_given?
  end
rescue ::Gapic::Rest::Error => e
  raise ::Google::Cloud::Error.from_error(e)
end

#fetch_models(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelsResponse` #fetch_models(page_size: nil, page_token: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelsResponse`

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Examples:

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchModelsRequest.new

# Call the fetch_models method.
result = client.fetch_models request

# The returned object is of type Google::Cloud::GkeRecommender::V1::FetchModelsResponse.
p result

Overloads:

#fetch_models(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse

Pass arguments to fetch_models via a request object, either of type FetchModelsRequest or an equivalent Hash.
Parameters:
- request (::Google::Cloud::GkeRecommender::V1::FetchModelsRequest, ::Hash) —
  A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) (defaults to: nil) —
  Overrides the default settings for this call, e.g, timeout, retries etc. Optional.
#fetch_models(page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse

Pass arguments to fetch_models via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters:
- page_size (::Integer) (defaults to: nil) —
  Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
- page_token (::String) (defaults to: nil) —
  Optional. The value of next_page_token received from a previous FetchModelsRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchModelsRequest must match the call that provided the page token.

Yields:

(result, operation) —
Access the result along with the TransportOperation object

Yield Parameters:

result (::Google::Cloud::GkeRecommender::V1::FetchModelsResponse)
operation (::Gapic::Rest::TransportOperation)

Returns:

(::Google::Cloud::GkeRecommender::V1::FetchModelsResponse)

Raises:

(::Google::Cloud::Error) —
if the REST call is aborted.

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 248

def fetch_models request, options = nil
  raise ::ArgumentError, "request must be provided" if request.nil?

  request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchModelsRequest

  # Converts hash and nil to an options object
  options = ::Gapic::CallOptions.new(**options.to_h) if options.respond_to? :to_h

  # Customize the options with defaults
  call_metadata = @config.rpcs.fetch_models.metadata.to_h

  # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers
  call_metadata[:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \
    lib_name: @config.lib_name, lib_version: @config.lib_version,
    gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION,
    transports_version_send: [:rest]

  call_metadata[:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty?
  call_metadata[:"x-goog-user-project"] = @quota_project_id if @quota_project_id

  options.apply_defaults timeout:      @config.rpcs.fetch_models.timeout,
                         metadata:     call_metadata,
                         retry_policy: @config.rpcs.fetch_models.retry_policy

  options.apply_defaults timeout:      @config.timeout,
                         metadata:     @config.metadata,
                         retry_policy: @config.retry_policy

  @gke_inference_quickstart_stub.fetch_models request, options do |result, operation|
    yield result, operation if block_given?
  end
rescue ::Gapic::Rest::Error => e
  raise ::Google::Cloud::Error.from_error(e)
end

#fetch_profiles(request, options = nil) ⇒ `::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>` #fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) ⇒ `::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>`

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Examples:

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::FetchProfilesRequest.new

# Call the fetch_profiles method.
result = client.fetch_profiles request

# The returned object is of type Gapic::PagedEnumerable. You can iterate
# over elements, and API calls will be issued to fetch pages as needed.
result.each do |item|
  # Each element is of type ::Google::Cloud::GkeRecommender::V1::Profile.
  p item
end

Overloads:

#fetch_profiles(request, options = nil) ⇒ ::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>

Pass arguments to fetch_profiles via a request object, either of type FetchProfilesRequest or an equivalent Hash.
Parameters:
- request (::Google::Cloud::GkeRecommender::V1::FetchProfilesRequest, ::Hash) —
  A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) (defaults to: nil) —
  Overrides the default settings for this call, e.g, timeout, retries etc. Optional.
#fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) ⇒ ::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>

Pass arguments to fetch_profiles via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters:
- model (::String) (defaults to: nil) —
  Optional. The model to filter profiles by. Open-source models follow the Huggingface Hub owner/model_name format. If not provided, all models are returned. Use GkeInferenceQuickstart.FetchModels to find available models.
- model_server (::String) (defaults to: nil) —
  Optional. The model server to filter profiles by. If not provided, all model servers are returned. Use GkeInferenceQuickstart.FetchModelServers to find available model servers for a given model.
- model_server_version (::String) (defaults to: nil) —
  Optional. The model server version to filter profiles by. If not provided, all model server versions are returned. Use GkeInferenceQuickstart.FetchModelServerVersions to find available versions for a given model and server.
- performance_requirements (::Google::Cloud::GkeRecommender::V1::PerformanceRequirements, ::Hash) (defaults to: nil) —
  Optional. The performance requirements to filter profiles. Profiles that do not meet these requirements are filtered out. If not provided, all profiles are returned.
- page_size (::Integer) (defaults to: nil) —
  Optional. The target number of results to return in a single response. If not specified, a default value will be chosen by the service. Note that the response may include a partial list and a caller should only rely on the response's next_page_token to determine if there are more instances left to be queried.
- page_token (::String) (defaults to: nil) —
  Optional. The value of next_page_token received from a previous FetchProfilesRequest call. Provide this to retrieve the subsequent page in a multi-page list of results. When paginating, all other parameters provided to FetchProfilesRequest must match the call that provided the page token.

Yields:

(result, operation) —
Access the result along with the TransportOperation object

Yield Parameters:

result (::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>)
operation (::Gapic::Rest::TransportOperation)

Returns:

(::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>)

Raises:

(::Google::Cloud::Error) —
if the REST call is aborted.

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 577

def fetch_profiles request, options = nil
  raise ::ArgumentError, "request must be provided" if request.nil?

  request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchProfilesRequest

  # Converts hash and nil to an options object
  options = ::Gapic::CallOptions.new(**options.to_h) if options.respond_to? :to_h

  # Customize the options with defaults
  call_metadata = @config.rpcs.fetch_profiles.metadata.to_h

  # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers
  call_metadata[:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \
    lib_name: @config.lib_name, lib_version: @config.lib_version,
    gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION,
    transports_version_send: [:rest]

  call_metadata[:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty?
  call_metadata[:"x-goog-user-project"] = @quota_project_id if @quota_project_id

  options.apply_defaults timeout:      @config.rpcs.fetch_profiles.timeout,
                         metadata:     call_metadata,
                         retry_policy: @config.rpcs.fetch_profiles.retry_policy

  options.apply_defaults timeout:      @config.timeout,
                         metadata:     @config.metadata,
                         retry_policy: @config.retry_policy

  @gke_inference_quickstart_stub.fetch_profiles request, options do |result, operation|
    result = ::Gapic::Rest::PagedEnumerable.new @gke_inference_quickstart_stub, :fetch_profiles, "profile", request, result, options
    yield result, operation if block_given?
    throw :response, result
  end
rescue ::Gapic::Rest::Error => e
  raise ::Google::Cloud::Error.from_error(e)
end

#generate_optimized_manifest(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse` #generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse`

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Examples:

Basic example

require "google/cloud/gke_recommender/v1"

# Create a client object. The client can be reused for multiple calls.
client = Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client.new

# Create a request. To set request fields, pass in keyword arguments.
request = Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest.new

# Call the generate_optimized_manifest method.
result = client.generate_optimized_manifest request

# The returned object is of type Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse.
p result

Overloads:

#generate_optimized_manifest(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse

Pass arguments to generate_optimized_manifest via a request object, either of type Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest or an equivalent Hash.
Parameters:
- request (::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest, ::Hash) —
  A request object representing the call parameters. Required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash.
- options (::Gapic::CallOptions, ::Hash) (defaults to: nil) —
  Overrides the default settings for this call, e.g, timeout, retries etc. Optional.
#generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse

Pass arguments to generate_optimized_manifest via keyword arguments. Note that at least one keyword argument is required. To specify no parameters, or to keep all the default parameter values, pass an empty Hash as a request object (see above).
Parameters:
- model_server_info (::Google::Cloud::GkeRecommender::V1::ModelServerInfo, ::Hash) (defaults to: nil) —
  Required. The model server configuration to generate the manifest for. Use GkeInferenceQuickstart.FetchProfiles to find valid configurations.
- accelerator_type (::String) (defaults to: nil) —
  Required. The accelerator type. Use GkeInferenceQuickstart.FetchProfiles to find valid accelerators for a given model_server_info.
- kubernetes_namespace (::String) (defaults to: nil) —
  Optional. The kubernetes namespace to deploy the manifests in.
- performance_requirements (::Google::Cloud::GkeRecommender::V1::PerformanceRequirements, ::Hash) (defaults to: nil) —
  Optional. The performance requirements to use for generating Horizontal Pod Autoscaler (HPA) resources. If provided, the manifest includes HPA resources to adjust the model server replica count to maintain the specified targets (e.g., NTPOT, TTFT) at a P50 latency. Cost targets are not currently supported for HPA generation. If the specified targets are not achievable, the HPA manifest will not be generated.
- storage_config (::Google::Cloud::GkeRecommender::V1::StorageConfig, ::Hash) (defaults to: nil) —
  Optional. The storage configuration for the model. If not provided, the model is loaded from Huggingface.

Yields:

(result, operation) —
Access the result along with the TransportOperation object

Yield Parameters:

result (::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse)
operation (::Gapic::Rest::TransportOperation)

Returns:

(::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse)

Raises:

(::Google::Cloud::Error) —
if the REST call is aborted.

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 680

def generate_optimized_manifest request, options = nil
  raise ::ArgumentError, "request must be provided" if request.nil?

  request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest

  # Converts hash and nil to an options object
  options = ::Gapic::CallOptions.new(**options.to_h) if options.respond_to? :to_h

  # Customize the options with defaults
  call_metadata = @config.rpcs.generate_optimized_manifest.metadata.to_h

  # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers
  call_metadata[:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \
    lib_name: @config.lib_name, lib_version: @config.lib_version,
    gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION,
    transports_version_send: [:rest]

  call_metadata[:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty?
  call_metadata[:"x-goog-user-project"] = @quota_project_id if @quota_project_id

  options.apply_defaults timeout:      @config.rpcs.generate_optimized_manifest.timeout,
                         metadata:     call_metadata,
                         retry_policy: @config.rpcs.generate_optimized_manifest.retry_policy

  options.apply_defaults timeout:      @config.timeout,
                         metadata:     @config.metadata,
                         retry_policy: @config.retry_policy

  @gke_inference_quickstart_stub.generate_optimized_manifest request, options do |result, operation|
    yield result, operation if block_given?
  end
rescue ::Gapic::Rest::Error => e
  raise ::Google::Cloud::Error.from_error(e)
end

#logger ⇒ `Logger`

The logger used for request/response debug logging.

Returns:

(Logger)



186
187
188

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 186

def logger
  @gke_inference_quickstart_stub.logger
end

#universe_domain ⇒ `String`

The effective universe domain

Returns:

(String)



119
120
121

# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 119

def universe_domain
  @gke_inference_quickstart_stub.universe_domain
end

Class: Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client

Overview

Defined Under Namespace

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize {|config| ... } ⇒ Client

Examples:

Class Method Details

.configure {|config| ... } ⇒ Client::Configuration

Examples:

Instance Method Details

#configure {|config| ... } ⇒ Client::Configuration

#fetch_benchmarking_data(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse #fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse

Examples:

Basic example

#fetch_model_server_versions(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse #fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse

Examples:

Basic example

#fetch_model_servers(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse #fetch_model_servers(model: nil, page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse

Examples:

Basic example

#fetch_models(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse #fetch_models(page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse

Examples:

Basic example

Examples:

Basic example

Examples:

Basic example

#logger ⇒ Logger

#universe_domain ⇒ String

#initialize {|config| ... } ⇒ `Client`

.configure {|config| ... } ⇒ `Client::Configuration`

#configure {|config| ... } ⇒ `Client::Configuration`

#fetch_benchmarking_data(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse` #fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse`

#fetch_model_server_versions(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse` #fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse`

#fetch_model_servers(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse` #fetch_model_servers(model: nil, page_size: nil, page_token: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse`

#fetch_models(request, options = nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelsResponse` #fetch_models(page_size: nil, page_token: nil) ⇒ `::Google::Cloud::GkeRecommender::V1::FetchModelsResponse`

#logger ⇒ `Logger`

#universe_domain ⇒ `String`