Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Inherits:
Object
  • Object
show all
Includes:
Core::Hashable, Core::JsonObjectSupport
Defined in:
lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: 1. Stage 1: Parses the part of information important from sample response via regex extract, or simply takes the sample response unmodified. 2. Stage 2: Calls the configured reward scorer to compute the reward.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Returns a new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.



58704
58705
58706
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58704

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#autorater_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer

ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases. For example, for math problems, users can use classification based autorater to calculate rewards based on the autorater parsed response against a reference answer. Corresponds to the JSON property autoraterScorer



58624
58625
58626
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58624

def autorater_scorer
  @autorater_scorer
end

#cloud_run_reward_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer

ReinforcementTuningCloudRunRewardScorer allows users to implement a reward function through GCP Cloud Run. Comparing with ReinforcementTuningCodeExecutionRewardScorer that runs in a Sandbox and has no internet access, Cloud Run reward scorer is fully controlled by users. The Cloud Run service should implement the following HTTP API: HTTP method: POST HTTP request body: ` "example": ReinforcementTuningExample, "response": Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` * example is a ReinforcementTuningExample in ProtoJSON format, (i.e., the format is the same as as one line in the training/validation dataset except that the keys must be in camel case). System instructions (i.e., example.get(" systemInstruction")) and references (i.e., example.get("references")) are also included in the example provided that they are set in the training/ validation dataset. * response is a Content in ProtoJSON format (i.e., keys must be in camel case), which is the same as the Online Prediction response for Gemini models. HTTP response body: "reward": float, " user_requested_aux_info": str // Optional where the field " user_requested_aux_info" is any (optional) string provided by users for assisting debugging. It's in snake case. This field is mostly useful when calling the GenAiTuningService.ValidateReinforcementTuningReward API, where the proto field (not Cloud Run HTTP response body) userRequestedAuxInfo will be populated if the Cloud Run reward function sets this field in the HTTP response. The following are examples for the HTTP request and response body. Example HTTP request body: ` "example": ` "contents": [ ` "role": "user", " parts": [ ` "text": "What is the capital of France?" ` ] ` ], "references": ` " answer": "Paris" ` `, "response": ` "parts": [ ` "text": "London" ` ] `, " metadata": ` "step": 1, "tuning_job_id": 123456789 ` ` Example HTTP response body: ` "reward": -1.0 ` Note: Reward output by Cloud Run reward function is clipped to be within [-1, 1], i.e., reward = max(min( reward, 1.0), -1.0). Corresponds to the JSON property cloudRunRewardScorer



58657
58658
58659
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58657

def cloud_run_reward_scorer
  @cloud_run_reward_scorer
end

#code_execution_reward_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer

ReinforcementTuningCodeExecutionRewardScorer allows users to implement a function to evaluate rewards for the sample response. The function signature is as follows: def evaluate(example: dict[str, Any], response: dict[str, Any]) -> float: ... example is a ReinforcementTuningExample in ProtoJSON format, (i.e., the format is the same as as one line in the training/ validation dataset except that the keys must be in camel case). System instructions (i.e., example.get("systemInstruction")) and references (i.e., example.get("references")) are also included in the example provided that they are set in the training/validation dataset. response is a Content in ProtoJSON format (i.e., keys must be in camel case), which is the same as the Online Prediction response for Gemini models. Note: Reward output by the evaluate function is clipped to be within [-1, 1], i.e., reward = max(min( reward, 1.0), -1.0). Corresponds to the JSON property codeExecutionRewardScorer



58674
58675
58676
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58674

def code_execution_reward_scorer
  @code_execution_reward_scorer
end

#parse_response_configGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig

Defines how to parse sample response config for reinforcement tuning. The parsed response (i.e., substring) will be passed to the reward functions. For example, the input prompt might be: > "Perform step-by-step thoughts first to problem A, finally output answer in the block." The sample response from the model under tuning might look like: > "Yes" Here, users can define the following parse config: ` "parseType": "REGEX_EXTRACT" , "regexExtractExpression": ".*(.*?)" ` The resulting parsed response would be "Yes" and will be passed to the reward functions for evaluating rewards. Corresponds to the JSON property parseResponseConfig



58687
58688
58689
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58687

def parse_response_config
  @parse_response_config
end

#reward_nameString

A unique reward name for identifying each single reinforcement tuning reward. Corresponds to the JSON property rewardName

Returns:

  • (String)


58692
58693
58694
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58692

def reward_name
  @reward_name
end

#string_match_reward_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer

ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases. For example, for math problems, users can use string match scorer to check if the correct exact answer is generated. Note: Reward returned by the string match reward function is clipped to be within [- 1, 1] if wrongAnswerReward or correctAnswerReward are beyond the range, i.e., reward = max(min(reward, 1.0), -1.0). Corresponds to the JSON property stringMatchRewardScorer



58702
58703
58704
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58702

def string_match_reward_scorer
  @string_match_reward_scorer
end

Instance Method Details

#update!(**args) ⇒ Object

Update properties of this object



58709
58710
58711
58712
58713
58714
58715
58716
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58709

def update!(**args)
  @autorater_scorer = args[:autorater_scorer] if args.key?(:autorater_scorer)
  @cloud_run_reward_scorer = args[:cloud_run_reward_scorer] if args.key?(:cloud_run_reward_scorer)
  @code_execution_reward_scorer = args[:code_execution_reward_scorer] if args.key?(:code_execution_reward_scorer)
  @parse_response_config = args[:parse_response_config] if args.key?(:parse_response_config)
  @reward_name = args[:reward_name] if args.key?(:reward_name)
  @string_match_reward_scorer = args[:string_match_reward_scorer] if args.key?(:string_match_reward_scorer)
end