Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
- Inherits:
-
Object
- Object
- Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
- Includes:
- Core::Hashable, Core::JsonObjectSupport
- Defined in:
- lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb
Overview
SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: 1. Stage 1: Parses the part of information important from sample response via regex extract, or simply takes the sample response unmodified. 2. Stage 2: Calls the configured reward scorer to compute the reward.
Instance Attribute Summary collapse
-
#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer
ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases.
-
#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer
ReinforcementTuningCloudRunRewardScorer allows users to implement a reward function through GCP Cloud Run.
-
#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer
ReinforcementTuningCodeExecutionRewardScorer allows users to implement a function to evaluate rewards for the sample response.
-
#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig
Defines how to parse sample response config for reinforcement tuning.
-
#reward_name ⇒ String
A unique reward name for identifying each single reinforcement tuning reward.
-
#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer
ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases.
Instance Method Summary collapse
-
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
constructor
A new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.
-
#update!(**args) ⇒ Object
Update properties of this object.
Constructor Details
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
Returns a new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.
58704 58705 58706 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58704 def initialize(**args) update!(**args) end |
Instance Attribute Details
#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer
ReinforcementTuningAutoraterScorer is used to score parsed responses for
classification based autorater use cases. For example, for math problems,
users can use classification based autorater to calculate rewards based on the
autorater parsed response against a reference answer.
Corresponds to the JSON property autoraterScorer
58624 58625 58626 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58624 def autorater_scorer @autorater_scorer end |
#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer
ReinforcementTuningCloudRunRewardScorer allows users to implement a reward
function through GCP Cloud Run. Comparing with
ReinforcementTuningCodeExecutionRewardScorer that runs in a Sandbox and has no
internet access, Cloud Run reward scorer is fully controlled by users. The
Cloud Run service should implement the following HTTP API: HTTP method: POST
HTTP request body: ` "example": ReinforcementTuningExample, "response":
Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` * example
is a ReinforcementTuningExample in ProtoJSON format, (i.e., the format is the
same as as one line in the training/validation dataset except that the keys
must be in camel case). System instructions (i.e., example.get("
systemInstruction")) and references (i.e., example.get("references")) are
also included in the example provided that they are set in the training/
validation dataset. * response is a Content in ProtoJSON format (i.e., keys
must be in camel case), which is the same as the Online Prediction response
for Gemini models. HTTP response body: "reward": float, "
user_requested_aux_info": str // Optional where the field "
user_requested_aux_info" is any (optional) string provided by users for
assisting debugging. It's in snake case. This field is mostly useful when
calling the GenAiTuningService.ValidateReinforcementTuningReward API, where
the proto field (not Cloud Run HTTP response body) userRequestedAuxInfo will
be populated if the Cloud Run reward function sets this field in the HTTP
response. The following are examples for the HTTP request and response body.
Example HTTP request body: ` "example": ` "contents": [ ` "role": "user", "
parts": [ ` "text": "What is the capital of France?" ` ] ` ], "references": ` "
answer": "Paris" ` `, "response": ` "parts": [ ` "text": "London" ` ] `, "
metadata": ` "step": 1, "tuning_job_id": 123456789 ` ` Example HTTP
response body: ` "reward": -1.0 ` Note: Reward output by Cloud Run
reward function is clipped to be within [-1, 1], i.e., reward = max(min(
reward, 1.0), -1.0).
Corresponds to the JSON property cloudRunRewardScorer
58657 58658 58659 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58657 def cloud_run_reward_scorer @cloud_run_reward_scorer end |
#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer
ReinforcementTuningCodeExecutionRewardScorer allows users to implement a
function to evaluate rewards for the sample response. The function signature
is as follows: def evaluate(example: dict[str, Any], response: dict[str,
Any]) -> float: ... example is a ReinforcementTuningExample in ProtoJSON
format, (i.e., the format is the same as as one line in the training/
validation dataset except that the keys must be in camel case). System
instructions (i.e., example.get("systemInstruction")) and references (i.e.,
example.get("references")) are also included in the example provided that
they are set in the training/validation dataset. response is a Content in
ProtoJSON format (i.e., keys must be in camel case), which is the same as the
Online Prediction response for Gemini models. Note: Reward output by the
evaluate function is clipped to be within [-1, 1], i.e., reward = max(min(
reward, 1.0), -1.0).
Corresponds to the JSON property codeExecutionRewardScorer
58674 58675 58676 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58674 def code_execution_reward_scorer @code_execution_reward_scorer end |
#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig
Defines how to parse sample response config for reinforcement tuning. The
parsed response (i.e., substring) will be passed to the reward functions. For
example, the input prompt might be: > "Perform step-by-step thoughts first to
problem A, finally output answer in the ` "parseType": "REGEX_EXTRACT"
, "regexExtractExpression": ".*(.*?)" ` The resulting parsed response
would be "Yes" and will be passed to the reward functions for evaluating
rewards.
Corresponds to the JSON property parseResponseConfig
58687 58688 58689 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58687 def parse_response_config @parse_response_config end |
#reward_name ⇒ String
A unique reward name for identifying each single reinforcement tuning reward.
Corresponds to the JSON property rewardName
58692 58693 58694 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58692 def reward_name @reward_name end |
#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer
ReinforcementTuningStringMatchRewardScorer is used to score parsed responses
for string matching use cases. For example, for math problems, users can use
string match scorer to check if the correct exact answer is generated. Note:
Reward returned by the string match reward function is clipped to be within [-
1, 1] if wrongAnswerReward or correctAnswerReward are beyond the range, i.e.,
reward = max(min(reward, 1.0), -1.0).
Corresponds to the JSON property stringMatchRewardScorer
58702 58703 58704 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58702 def string_match_reward_scorer @string_match_reward_scorer end |
Instance Method Details
#update!(**args) ⇒ Object
Update properties of this object
58709 58710 58711 58712 58713 58714 58715 58716 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 58709 def update!(**args) @autorater_scorer = args[:autorater_scorer] if args.key?(:autorater_scorer) @cloud_run_reward_scorer = args[:cloud_run_reward_scorer] if args.key?(:cloud_run_reward_scorer) @code_execution_reward_scorer = args[:code_execution_reward_scorer] if args.key?(:code_execution_reward_scorer) @parse_response_config = args[:parse_response_config] if args.key?(:parse_response_config) @reward_name = args[:reward_name] if args.key?(:reward_name) @string_match_reward_scorer = args[:string_match_reward_scorer] if args.key?(:string_match_reward_scorer) end |