Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Inherits:

Object

Object
Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

show all

Includes:: Core::Hashable, Core::JsonObjectSupport

Defined in:: lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: 1. Stage 1: Parses the part of information important from sample response via regex extract, or simply takes the sample response unmodified. 2. Stage 2: Calls the configured reward scorer to compute the reward.

Instance Attribute Summary collapse

#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer
ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases.
#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer
ReinforcementTuningCloudRunRewardScorer allows users to implement a reward function through GCP Cloud Run.
#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer
ReinforcementTuningCodeExecutionRewardScorer allows users to implement a function to evaluate rewards for the sample response.
#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig
Defines how to parse sample response config for reinforcement tuning.
#reward_name ⇒ String
A unique reward name for identifying each single reinforcement tuning reward.
#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer
ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases.

Instance Method Summary collapse

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig constructor
A new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.
#update!(**args) ⇒ Object
Update properties of this object.

Constructor Details

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig`

Returns a new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.



58704
58705
58706

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58704

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#autorater_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer`

ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases. For example, for math problems, users can use classification based autorater to calculate rewards based on the autorater parsed response against a reference answer. Corresponds to the JSON property autoraterScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer)



58624
58625
58626

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58624

def autorater_scorer
  @autorater_scorer
end

#cloud_run_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer`

ReinforcementTuningCloudRunRewardScorer allows users to implement a reward function through GCP Cloud Run. Comparing with ReinforcementTuningCodeExecutionRewardScorer that runs in a Sandbox and has no internet access, Cloud Run reward scorer is fully controlled by users. The Cloud Run service should implement the following HTTP API: HTTP method: POST HTTP request body: ` "example": ReinforcementTuningExample, "response": Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` * example is a ReinforcementTuningExample in ProtoJSON format, (i.e., the format is the same as as one line in the training/validation dataset except that the keys must be in camel case). System instructions (i.e., example.get(" systemInstruction")) and references (i.e., example.get("references")) are also included in the example provided that they are set in the training/ validation dataset. * response is a Content in ProtoJSON format (i.e., keys must be in camel case), which is the same as the Online Prediction response for Gemini models. HTTP response body: "reward": float, " user_requested_aux_info": str // Optional where the field " user_requested_aux_info" is any (optional) string provided by users for assisting debugging. It's in snake case. This field is mostly useful when calling the GenAiTuningService.ValidateReinforcementTuningReward API, where the proto field (not Cloud Run HTTP response body) userRequestedAuxInfo will be populated if the Cloud Run reward function sets this field in the HTTP response. The following are examples for the HTTP request and response body. Example HTTP request body: ` "example": ` "contents": [ ` "role": "user", " parts": [ ` "text": "What is the capital of France?" ` ] ` ], "references": ` " answer": "Paris" ` `, "response": ` "parts": [ ` "text": "London" ` ] `, " metadata": ` "step": 1, "tuning_job_id": 123456789 ` ` Example HTTP response body: ` "reward": -1.0 ` Note: Reward output by Cloud Run reward function is clipped to be within [-1, 1], i.e., reward = max(min( reward, 1.0), -1.0). Corresponds to the JSON property cloudRunRewardScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer)



58657
58658
58659

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58657

def cloud_run_reward_scorer
  @cloud_run_reward_scorer
end

#code_execution_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer`

ReinforcementTuningCodeExecutionRewardScorer allows users to implement a function to evaluate rewards for the sample response. The function signature is as follows: def evaluate(example: dict[str, Any], response: dict[str, Any]) -> float: ... example is a ReinforcementTuningExample in ProtoJSON format, (i.e., the format is the same as as one line in the training/ validation dataset except that the keys must be in camel case). System instructions (i.e., example.get("systemInstruction")) and references (i.e., example.get("references")) are also included in the example provided that they are set in the training/validation dataset. response is a Content in ProtoJSON format (i.e., keys must be in camel case), which is the same as the Online Prediction response for Gemini models. Note: Reward output by the evaluate function is clipped to be within [-1, 1], i.e., reward = max(min( reward, 1.0), -1.0). Corresponds to the JSON property codeExecutionRewardScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer)



58674
58675
58676

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58674

def code_execution_reward_scorer
  @code_execution_reward_scorer
end

#parse_response_config ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig`

Defines how to parse sample response config for reinforcement tuning. The parsed response (i.e., substring) will be passed to the reward functions. For example, the input prompt might be: > "Perform step-by-step thoughts first to problem A, finally output answer in the block." The sample response from the model under tuning might look like: > "Yes" Here, users can define the following parse config: ` "parseType": "REGEX_EXTRACT" , "regexExtractExpression": ".*(.*?)" ` The resulting parsed response would be "Yes" and will be passed to the reward functions for evaluating rewards. Corresponds to the JSON property parseResponseConfig

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig)



58687
58688
58689

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58687

def parse_response_config
  @parse_response_config
end

#reward_name ⇒ `String`

A unique reward name for identifying each single reinforcement tuning reward. Corresponds to the JSON property rewardName

Returns:

(String)



58692
58693
58694

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58692

def reward_name
  @reward_name
end

#string_match_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer`

ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases. For example, for math problems, users can use string match scorer to check if the correct exact answer is generated. Note: Reward returned by the string match reward function is clipped to be within [- 1, 1] if wrongAnswerReward or correctAnswerReward are beyond the range, i.e., reward = max(min(reward, 1.0), -1.0). Corresponds to the JSON property stringMatchRewardScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer)



58702
58703
58704

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58702

def string_match_reward_scorer
  @string_match_reward_scorer
end

Instance Method Details

#update!(**args) ⇒ `Object`

Update properties of this object

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58709

def update!(**args)
  @autorater_scorer = args[:autorater_scorer] if args.key?(:autorater_scorer)
  @cloud_run_reward_scorer = args[:cloud_run_reward_scorer] if args.key?(:cloud_run_reward_scorer)
  @code_execution_reward_scorer = args[:code_execution_reward_scorer] if args.key?(:code_execution_reward_scorer)
  @parse_response_config = args[:parse_response_config] if args.key?(:parse_response_config)
  @reward_name = args[:reward_name] if args.key?(:reward_name)
  @string_match_reward_scorer = args[:string_match_reward_scorer] if args.key?(:string_match_reward_scorer)
end

Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Instance Attribute Details

#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer

#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer

#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer

#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig

#reward_name ⇒ String

#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer

Instance Method Details

#update!(**args) ⇒ Object

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig`

#autorater_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer`

#cloud_run_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer`

#code_execution_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer`

#parse_response_config ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig`

#reward_name ⇒ `String`

#string_match_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer`

#update!(**args) ⇒ `Object`