Class: Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client
- Inherits:
-
Object
- Object
- Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::Client
- Defined in:
- lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb
Overview
REST client for the GkeInferenceQuickstart service.
GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.
Defined Under Namespace
Classes: Configuration
Class Method Summary collapse
-
.configure {|config| ... } ⇒ Client::Configuration
Configure the GkeInferenceQuickstart Client class.
Instance Method Summary collapse
-
#configure {|config| ... } ⇒ Client::Configuration
Configure the GkeInferenceQuickstart Client instance.
-
#fetch_benchmarking_data(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
Fetches all of the benchmarking data available for a profile.
-
#fetch_model_server_versions(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
Fetches available model server versions.
-
#fetch_model_servers(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
Fetches available model servers.
-
#fetch_models(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
Fetches available models.
-
#fetch_profiles(request, options = nil) {|result, operation| ... } ⇒ ::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
Fetches available profiles.
-
#generate_optimized_manifest(request, options = nil) {|result, operation| ... } ⇒ ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations.
-
#initialize {|config| ... } ⇒ Client
constructor
Create a new GkeInferenceQuickstart REST client object.
-
#logger ⇒ Logger
The logger used for request/response debug logging.
-
#universe_domain ⇒ String
The effective universe domain.
Constructor Details
#initialize {|config| ... } ⇒ Client
Create a new GkeInferenceQuickstart REST client object.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 139 def initialize # Create the configuration object @config = Configuration.new Client.configure # Yield the configuration if needed yield @config if block_given? # Create credentials credentials = @config.credentials # Use self-signed JWT if the endpoint is unchanged from default, # but only if the default endpoint does not have a region prefix. enable_self_signed_jwt = @config.endpoint.nil? || (@config.endpoint == Configuration::DEFAULT_ENDPOINT && !@config.endpoint.split(".").first.include?("-")) credentials ||= Credentials.default scope: @config.scope, enable_self_signed_jwt: enable_self_signed_jwt if credentials.is_a?(::String) || credentials.is_a?(::Hash) credentials = Credentials.new credentials, scope: @config.scope end @quota_project_id = @config.quota_project @quota_project_id ||= credentials.quota_project_id if credentials.respond_to? :quota_project_id @gke_inference_quickstart_stub = ::Google::Cloud::GkeRecommender::V1::GkeInferenceQuickstart::Rest::ServiceStub.new( endpoint: @config.endpoint, endpoint_template: DEFAULT_ENDPOINT_TEMPLATE, universe_domain: @config.universe_domain, credentials: credentials, logger: @config.logger ) @gke_inference_quickstart_stub.logger(stub: true)&.info do |entry| entry.set_system_name entry.set_service entry. = "Created client for #{entry.service}" entry.set_credentials_fields credentials entry.set "customEndpoint", @config.endpoint if @config.endpoint entry.set "defaultTimeout", @config.timeout if @config.timeout entry.set "quotaProject", @quota_project_id if @quota_project_id end end |
Class Method Details
.configure {|config| ... } ⇒ Client::Configuration
Configure the GkeInferenceQuickstart Client class.
See Configuration for a description of the configuration fields.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 65 def self.configure @configure ||= begin namespace = ["Google", "Cloud", "GkeRecommender", "V1"] parent_config = while namespace.any? parent_name = namespace.join "::" parent_const = const_get parent_name break parent_const.configure if parent_const.respond_to? :configure namespace.pop end default_config = Client::Configuration.new parent_config default_config.rpcs.fetch_models.timeout = 60.0 default_config.rpcs.fetch_model_servers.timeout = 60.0 default_config.rpcs.fetch_model_server_versions.timeout = 60.0 default_config.rpcs.fetch_profiles.timeout = 60.0 default_config.rpcs.generate_optimized_manifest.timeout = 60.0 default_config.rpcs.fetch_benchmarking_data.timeout = 60.0 default_config end yield @configure if block_given? @configure end |
Instance Method Details
#configure {|config| ... } ⇒ Client::Configuration
Configure the GkeInferenceQuickstart Client instance.
The configuration is set to the derived mode, meaning that values can be changed, but structural changes (adding new fields, etc.) are not allowed. Structural changes should be made on configure.
See Configuration for a description of the configuration fields.
109 110 111 112 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 109 def configure yield @config if block_given? @config end |
#fetch_benchmarking_data(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse #fetch_benchmarking_data(model_server_info: nil, instance_type: nil, pricing_model: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataResponse
Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.
771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 771 def fetch_benchmarking_data request, = nil raise ::ArgumentError, "request must be provided" if request.nil? request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchBenchmarkingDataRequest # Converts hash and nil to an options object = ::Gapic::CallOptions.new(**.to_h) if .respond_to? :to_h # Customize the options with defaults = @config.rpcs.fetch_benchmarking_data..to_h # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers [:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \ lib_name: @config.lib_name, lib_version: @config.lib_version, gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION, transports_version_send: [:rest] [:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty? [:"x-goog-user-project"] = @quota_project_id if @quota_project_id .apply_defaults timeout: @config.rpcs.fetch_benchmarking_data.timeout, metadata: , retry_policy: @config.rpcs.fetch_benchmarking_data.retry_policy .apply_defaults timeout: @config.timeout, metadata: @config., retry_policy: @config.retry_policy @gke_inference_quickstart_stub.fetch_benchmarking_data request, do |result, operation| yield result, operation if block_given? end rescue ::Gapic::Rest::Error => e raise ::Google::Cloud::Error.from_error(e) end |
#fetch_model_server_versions(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse #fetch_model_server_versions(model: nil, model_server: nil, page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsResponse
Fetches available model server versions. Open-source servers use their own
versioning schemas (e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the
accelerator. For example, vllm uses semver on GPUs, but returns nightly
build tags on TPUs. All available versions will be returned when different
schemas are present.
452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 452 def fetch_model_server_versions request, = nil raise ::ArgumentError, "request must be provided" if request.nil? request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchModelServerVersionsRequest # Converts hash and nil to an options object = ::Gapic::CallOptions.new(**.to_h) if .respond_to? :to_h # Customize the options with defaults = @config.rpcs.fetch_model_server_versions..to_h # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers [:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \ lib_name: @config.lib_name, lib_version: @config.lib_version, gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION, transports_version_send: [:rest] [:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty? [:"x-goog-user-project"] = @quota_project_id if @quota_project_id .apply_defaults timeout: @config.rpcs.fetch_model_server_versions.timeout, metadata: , retry_policy: @config.rpcs.fetch_model_server_versions.retry_policy .apply_defaults timeout: @config.timeout, metadata: @config., retry_policy: @config.retry_policy @gke_inference_quickstart_stub.fetch_model_server_versions request, do |result, operation| yield result, operation if block_given? end rescue ::Gapic::Rest::Error => e raise ::Google::Cloud::Error.from_error(e) end |
#fetch_model_servers(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse #fetch_model_servers(model: nil, page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelServersResponse
Fetches available model servers. Open-source model servers use simplified,
lowercase names (e.g., vllm).
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 345 def fetch_model_servers request, = nil raise ::ArgumentError, "request must be provided" if request.nil? request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchModelServersRequest # Converts hash and nil to an options object = ::Gapic::CallOptions.new(**.to_h) if .respond_to? :to_h # Customize the options with defaults = @config.rpcs.fetch_model_servers..to_h # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers [:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \ lib_name: @config.lib_name, lib_version: @config.lib_version, gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION, transports_version_send: [:rest] [:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty? [:"x-goog-user-project"] = @quota_project_id if @quota_project_id .apply_defaults timeout: @config.rpcs.fetch_model_servers.timeout, metadata: , retry_policy: @config.rpcs.fetch_model_servers.retry_policy .apply_defaults timeout: @config.timeout, metadata: @config., retry_policy: @config.retry_policy @gke_inference_quickstart_stub.fetch_model_servers request, do |result, operation| yield result, operation if block_given? end rescue ::Gapic::Rest::Error => e raise ::Google::Cloud::Error.from_error(e) end |
#fetch_models(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse #fetch_models(page_size: nil, page_token: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::FetchModelsResponse
Fetches available models. Open-source models follow the Huggingface Hub
owner/model_name format.
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 248 def fetch_models request, = nil raise ::ArgumentError, "request must be provided" if request.nil? request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchModelsRequest # Converts hash and nil to an options object = ::Gapic::CallOptions.new(**.to_h) if .respond_to? :to_h # Customize the options with defaults = @config.rpcs.fetch_models..to_h # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers [:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \ lib_name: @config.lib_name, lib_version: @config.lib_version, gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION, transports_version_send: [:rest] [:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty? [:"x-goog-user-project"] = @quota_project_id if @quota_project_id .apply_defaults timeout: @config.rpcs.fetch_models.timeout, metadata: , retry_policy: @config.rpcs.fetch_models.retry_policy .apply_defaults timeout: @config.timeout, metadata: @config., retry_policy: @config.retry_policy @gke_inference_quickstart_stub.fetch_models request, do |result, operation| yield result, operation if block_given? end rescue ::Gapic::Rest::Error => e raise ::Google::Cloud::Error.from_error(e) end |
#fetch_profiles(request, options = nil) ⇒ ::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile> #fetch_profiles(model: nil, model_server: nil, model_server_version: nil, performance_requirements: nil, page_size: nil, page_token: nil) ⇒ ::Gapic::Rest::PagedEnumerable<::Google::Cloud::GkeRecommender::V1::Profile>
Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 577 def fetch_profiles request, = nil raise ::ArgumentError, "request must be provided" if request.nil? request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::FetchProfilesRequest # Converts hash and nil to an options object = ::Gapic::CallOptions.new(**.to_h) if .respond_to? :to_h # Customize the options with defaults = @config.rpcs.fetch_profiles..to_h # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers [:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \ lib_name: @config.lib_name, lib_version: @config.lib_version, gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION, transports_version_send: [:rest] [:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty? [:"x-goog-user-project"] = @quota_project_id if @quota_project_id .apply_defaults timeout: @config.rpcs.fetch_profiles.timeout, metadata: , retry_policy: @config.rpcs.fetch_profiles.retry_policy .apply_defaults timeout: @config.timeout, metadata: @config., retry_policy: @config.retry_policy @gke_inference_quickstart_stub.fetch_profiles request, do |result, operation| result = ::Gapic::Rest::PagedEnumerable.new @gke_inference_quickstart_stub, :fetch_profiles, "profile", request, result, yield result, operation if block_given? throw :response, result end rescue ::Gapic::Rest::Error => e raise ::Google::Cloud::Error.from_error(e) end |
#generate_optimized_manifest(request, options = nil) ⇒ ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse #generate_optimized_manifest(model_server_info: nil, accelerator_type: nil, kubernetes_namespace: nil, performance_requirements: nil, storage_config: nil) ⇒ ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestResponse
Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.
680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 680 def generate_optimized_manifest request, = nil raise ::ArgumentError, "request must be provided" if request.nil? request = ::Gapic::Protobuf.coerce request, to: ::Google::Cloud::GkeRecommender::V1::GenerateOptimizedManifestRequest # Converts hash and nil to an options object = ::Gapic::CallOptions.new(**.to_h) if .respond_to? :to_h # Customize the options with defaults = @config.rpcs.generate_optimized_manifest..to_h # Set x-goog-api-client, x-goog-user-project and x-goog-api-version headers [:"x-goog-api-client"] ||= ::Gapic::Headers.x_goog_api_client \ lib_name: @config.lib_name, lib_version: @config.lib_version, gapic_version: ::Google::Cloud::GkeRecommender::V1::VERSION, transports_version_send: [:rest] [:"x-goog-api-version"] = API_VERSION unless API_VERSION.empty? [:"x-goog-user-project"] = @quota_project_id if @quota_project_id .apply_defaults timeout: @config.rpcs.generate_optimized_manifest.timeout, metadata: , retry_policy: @config.rpcs.generate_optimized_manifest.retry_policy .apply_defaults timeout: @config.timeout, metadata: @config., retry_policy: @config.retry_policy @gke_inference_quickstart_stub.generate_optimized_manifest request, do |result, operation| yield result, operation if block_given? end rescue ::Gapic::Rest::Error => e raise ::Google::Cloud::Error.from_error(e) end |
#logger ⇒ Logger
The logger used for request/response debug logging.
186 187 188 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 186 def logger @gke_inference_quickstart_stub.logger end |
#universe_domain ⇒ String
The effective universe domain
119 120 121 |
# File 'lib/google/cloud/gke_recommender/v1/gke_inference_quickstart/rest/client.rb', line 119 def universe_domain @gke_inference_quickstart_stub.universe_domain end |