Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Inherits:

Object

Object
Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

show all

Includes:: Core::Hashable, Core::JsonObjectSupport

Defined in:: lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: stage 1: parse the part of information important from sample response via regex extract or simply take the sample response unmodified. stage 2: Call specific reward scorer to compute the reward and also output whether the sample answer is correct. While wrong answer and correct answer should get assigned different rewards, correct answers could also get assigned different rewards.

Instance Attribute Summary collapse

#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer
ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases.
#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer
The Cloud Run service should implement the following HTTP API: HTTP Method: POST HTTP Request Body: ` "example": ReinforcementTuningExample, "response": Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` where example is a ReinforcementTuningExample in ProtoJSON format and response is a Content in ProtoJSON format.
#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer
Expects the user to implement the following function: ``example` is the dict using exactly the same format as the training, validation dataset, and also includes the system instructions and the references (e.g., user can use references for storing ground truth of this example).
#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig
Defines how to parse sample response config for reinforcement tuning.
#reward_name ⇒ String
A unique reward name used to identify each single reinforcement tuning reward.
#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer
ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases.

Instance Method Summary collapse

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig constructor
A new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.
#update!(**args) ⇒ Object
Update properties of this object.

Constructor Details

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig`

Returns a new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.



57102
57103
57104

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57102

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#autorater_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer`

ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases. For example, for math problems, we can use classification based autorater to calculate the reward based on the autorater parsed response against reference answer. Corresponds to the JSON property autoraterScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer)



57048
57049
57050

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57048

def autorater_scorer
  @autorater_scorer
end

#cloud_run_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer`

The Cloud Run service should implement the following HTTP API: HTTP Method: POST HTTP Request Body: ` "example": ReinforcementTuningExample, "response": Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` where example is a ReinforcementTuningExample in ProtoJSON format and response is a Content in ProtoJSON format. HTTP Response Body: ` "reward": float ` Example HTTP Request Body: ` "example": ` "contents": [ ` "role": "user", " parts": [ ` "text": "What is the capital of France?" ` ] ` ], "references": ` " answer": "Paris", ` `, "response": ` "parts": [ ` "text": "London" ` ] `, " metadata": ` "step": 1 "tuning_job_id": 123456789 ` ` Example HTTP Response Body: ` "reward": -1.0 ` Important: reward output by the function is clipped to be within [-1, 1]. I.e., reward = max(min(reward, 1), -1) Corresponds to the JSON property cloudRunRewardScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer)



57063
57064
57065

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57063

def cloud_run_reward_scorer
  @cloud_run_reward_scorer
end

#code_execution_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer`

Expects the user to implement the following function: `example` is the dict using exactly the same format as the training, validation dataset, and also includes the system instructions and the references (e.g., user can use references for storing ground truth of this example). `response` is a dict of Content type, which is the same as all the other 1P tuning method, as well as the Online Prediction def evaluate(example: Dict[str, ...], response:Dict[str, Content]) -> float: where the first returned argument is reward. References and system instruction will be empty if not provided by the user. Different correct answers can get different rewards. Different wrong answers can also get different rewards. Important: reward output by the function is clipped to be within [-1, 1]. I.e., reward = max(min(reward, 1), -1) Corresponds to the JSON property codeExecutionRewardScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer)



57078
57079
57080

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57078

def code_execution_reward_scorer
  @code_execution_reward_scorer
end

#parse_response_config ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig`

Defines how to parse sample response config for reinforcement tuning. For example, the input prompt might be: "Perform step by step thoughts first to problem A, finally output answer in block." And the sample response might look like: "blahblah". Here, user can define the following parse config: parse_type: REGEX_EXTRACT regex_extract_expression: ".*(.*?)" And we would have returned "blahblah" to reward scoring function. Corresponds to the JSON property parseResponseConfig

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig)



57088
57089
57090

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57088

def parse_response_config
  @parse_response_config
end

#reward_name ⇒ `String`

A unique reward name used to identify each single reinforcement tuning reward. Corresponds to the JSON property rewardName

Returns:

(String)



57093
57094
57095

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57093

def reward_name
  @reward_name
end

#string_match_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer`

ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases. For example, for math problems, we can use string match scorer to check if the correct exact answer is generated. Corresponds to the JSON property stringMatchRewardScorer

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer)



57100
57101
57102

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57100

def string_match_reward_scorer
  @string_match_reward_scorer
end

Instance Method Details

#update!(**args) ⇒ `Object`

Update properties of this object

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57107

def update!(**args)
  @autorater_scorer = args[:autorater_scorer] if args.key?(:autorater_scorer)
  @cloud_run_reward_scorer = args[:cloud_run_reward_scorer] if args.key?(:cloud_run_reward_scorer)
  @code_execution_reward_scorer = args[:code_execution_reward_scorer] if args.key?(:code_execution_reward_scorer)
  @parse_response_config = args[:parse_response_config] if args.key?(:parse_response_config)
  @reward_name = args[:reward_name] if args.key?(:reward_name)
  @string_match_reward_scorer = args[:string_match_reward_scorer] if args.key?(:string_match_reward_scorer)
end

Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Instance Attribute Details

#autorater_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer

#cloud_run_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer

#code_execution_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer

#parse_response_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig

#reward_name ⇒ String

#string_match_reward_scorer ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer

Instance Method Details

#update!(**args) ⇒ Object

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig`

#autorater_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer`

#cloud_run_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer`

#code_execution_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer`

#parse_response_config ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig`

#reward_name ⇒ `String`

#string_match_reward_scorer ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer`

#update!(**args) ⇒ `Object`