Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
- Inherits:
-
Object
- Object
- Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
- Includes:
- Core::Hashable, Core::JsonObjectSupport
- Defined in:
- lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb
Overview
SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: stage 1: parse the part of information important from sample response via regex extract or simply take the sample response unmodified. stage 2: Call specific reward scorer to compute the reward and also output whether the sample answer is correct. While wrong answer and correct answer should get assigned different rewards, correct answers could also get assigned different rewards.
Instance Attribute Summary collapse
-
#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer
ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases.
-
#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer
The Cloud Run service should implement the following HTTP API:
HTTP Method: POST HTTP Request Body: ` "example": ReinforcementTuningExample, "response": Content, "metadata": ` "step": int "tuning_job_id": int64 ` `whereexampleis a ReinforcementTuningExample in ProtoJSON format andresponseis a Content in ProtoJSON format. -
#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer
Expects the user to implement the following function: ``
example` is the dict using exactly the same format as the training, validation dataset, and also includes the system instructions and the references (e.g., user can use references for storing ground truth of this example). -
#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig
Defines how to parse sample response config for reinforcement tuning.
-
#reward_name ⇒ String
A unique reward name used to identify each single reinforcement tuning reward.
-
#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer
ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases.
Instance Method Summary collapse
-
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
constructor
A new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.
-
#update!(**args) ⇒ Object
Update properties of this object.
Constructor Details
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
Returns a new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.
57102 57103 57104 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57102 def initialize(**args) update!(**args) end |
Instance Attribute Details
#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer
ReinforcementTuningAutoraterScorer is used to score parsed responses for
classification based autorater use cases. For example, for math problems, we
can use classification based autorater to calculate the reward based on the
autorater parsed response against reference answer.
Corresponds to the JSON property autoraterScorer
57048 57049 57050 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57048 def autorater_scorer @autorater_scorer end |
#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer
The Cloud Run service should implement the following HTTP API: HTTP Method:
POST HTTP Request Body: ` "example": ReinforcementTuningExample, "response":
Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` where
example is a ReinforcementTuningExample in ProtoJSON format and response is
a Content in ProtoJSON format. HTTP Response Body: ` "reward": float `
Example HTTP Request Body: ` "example": ` "contents": [ ` "role": "user", "
parts": [ ` "text": "What is the capital of France?" ` ] ` ], "references": ` "
answer": "Paris", ` `, "response": ` "parts": [ ` "text": "London" ` ] `, "
metadata": ` "step": 1 "tuning_job_id": 123456789 ` ` Example HTTP Response
Body: ` "reward": -1.0 ` Important: reward output by the function is
clipped to be within [-1, 1]. I.e., reward = max(min(reward, 1), -1)
Corresponds to the JSON property cloudRunRewardScorer
57063 57064 57065 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57063 def cloud_run_reward_scorer @cloud_run_reward_scorer end |
#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer
Expects the user to implement the following function: `example` is the
dict using exactly the same format as the training, validation dataset, and
also includes the system instructions and the references (e.g., user can use
references for storing ground truth of this example). `response` is a dict of
Content type, which is the same as all the other 1P tuning method, as well as
the Online Prediction def evaluate(example: Dict[str, ...], response:Dict[str,
Content]) -> float: where the first returned argument is reward.
References and system instruction will be empty if not provided by the user.
Different correct answers can get different rewards. Different wrong answers
can also get different rewards. Important: reward output by the function is
clipped to be within [-1, 1]. I.e., reward = max(min(reward, 1), -1)
Corresponds to the JSON property codeExecutionRewardScorer
57078 57079 57080 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57078 def code_execution_reward_scorer @code_execution_reward_scorer end |
#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig
Defines how to parse sample response config for reinforcement tuning. For
example, the input prompt might be: "Perform step by step thoughts first to
problem A, finally output answer in block." And the sample response might
look like: "blahblah". Here, user can define the following parse config:
parse_type: REGEX_EXTRACT regex_extract_expression: ".*(.*?)" And we would
have returned "blahblah" to reward scoring function.
Corresponds to the JSON property parseResponseConfig
57088 57089 57090 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57088 def parse_response_config @parse_response_config end |
#reward_name ⇒ String
A unique reward name used to identify each single reinforcement tuning reward.
Corresponds to the JSON property rewardName
57093 57094 57095 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57093 def reward_name @reward_name end |
#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer
ReinforcementTuningStringMatchRewardScorer is used to score parsed responses
for string matching use cases. For example, for math problems, we can use
string match scorer to check if the correct exact answer is generated.
Corresponds to the JSON property stringMatchRewardScorer
57100 57101 57102 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57100 def string_match_reward_scorer @string_match_reward_scorer end |
Instance Method Details
#update!(**args) ⇒ Object
Update properties of this object
57107 57108 57109 57110 57111 57112 57113 57114 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57107 def update!(**args) @autorater_scorer = args[:autorater_scorer] if args.key?(:autorater_scorer) @cloud_run_reward_scorer = args[:cloud_run_reward_scorer] if args.key?(:cloud_run_reward_scorer) @code_execution_reward_scorer = args[:code_execution_reward_scorer] if args.key?(:code_execution_reward_scorer) @parse_response_config = args[:parse_response_config] if args.key?(:parse_response_config) @reward_name = args[:reward_name] if args.key?(:reward_name) @string_match_reward_scorer = args[:string_match_reward_scorer] if args.key?(:string_match_reward_scorer) end |