Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Inherits:
Object
  • Object
show all
Includes:
Core::Hashable, Core::JsonObjectSupport
Defined in:
lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: stage 1: parse the part of information important from sample response via regex extract or simply take the sample response unmodified. stage 2: Call specific reward scorer to compute the reward and also output whether the sample answer is correct. While wrong answer and correct answer should get assigned different rewards, correct answers could also get assigned different rewards.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

Returns a new instance of GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig.



57102
57103
57104
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57102

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#autorater_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningAutoraterScorer

ReinforcementTuningAutoraterScorer is used to score parsed responses for classification based autorater use cases. For example, for math problems, we can use classification based autorater to calculate the reward based on the autorater parsed response against reference answer. Corresponds to the JSON property autoraterScorer



57048
57049
57050
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57048

def autorater_scorer
  @autorater_scorer
end

#cloud_run_reward_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCloudRunRewardScorer

The Cloud Run service should implement the following HTTP API: HTTP Method: POST HTTP Request Body: ` "example": ReinforcementTuningExample, "response": Content, "metadata": ` "step": int "tuning_job_id": int64 ` ` where example is a ReinforcementTuningExample in ProtoJSON format and response is a Content in ProtoJSON format. HTTP Response Body: ` "reward": float ` Example HTTP Request Body: ` "example": ` "contents": [ ` "role": "user", " parts": [ ` "text": "What is the capital of France?" ` ] ` ], "references": ` " answer": "Paris", ` `, "response": ` "parts": [ ` "text": "London" ` ] `, " metadata": ` "step": 1 "tuning_job_id": 123456789 ` ` Example HTTP Response Body: ` "reward": -1.0 ` Important: reward output by the function is clipped to be within [-1, 1]. I.e., reward = max(min(reward, 1), -1) Corresponds to the JSON property cloudRunRewardScorer



57063
57064
57065
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57063

def cloud_run_reward_scorer
  @cloud_run_reward_scorer
end

#code_execution_reward_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningCodeExecutionRewardScorer

Expects the user to implement the following function: `example` is the dict using exactly the same format as the training, validation dataset, and also includes the system instructions and the references (e.g., user can use references for storing ground truth of this example). `response` is a dict of Content type, which is the same as all the other 1P tuning method, as well as the Online Prediction def evaluate(example: Dict[str, ...], response:Dict[str, Content]) -> float: where the first returned argument is reward. References and system instruction will be empty if not provided by the user. Different correct answers can get different rewards. Different wrong answers can also get different rewards. Important: reward output by the function is clipped to be within [-1, 1]. I.e., reward = max(min(reward, 1), -1) Corresponds to the JSON property codeExecutionRewardScorer



57078
57079
57080
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57078

def code_execution_reward_scorer
  @code_execution_reward_scorer
end

#parse_response_configGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningParseResponseConfig

Defines how to parse sample response config for reinforcement tuning. For example, the input prompt might be: "Perform step by step thoughts first to problem A, finally output answer in block." And the sample response might look like: "blahblah". Here, user can define the following parse config: parse_type: REGEX_EXTRACT regex_extract_expression: ".*(.*?)" And we would have returned "blahblah" to reward scoring function. Corresponds to the JSON property parseResponseConfig



57088
57089
57090
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57088

def parse_response_config
  @parse_response_config
end

#reward_nameString

A unique reward name used to identify each single reinforcement tuning reward. Corresponds to the JSON property rewardName

Returns:

  • (String)


57093
57094
57095
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57093

def reward_name
  @reward_name
end

#string_match_reward_scorerGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1ReinforcementTuningStringMatchRewardScorer

ReinforcementTuningStringMatchRewardScorer is used to score parsed responses for string matching use cases. For example, for math problems, we can use string match scorer to check if the correct exact answer is generated. Corresponds to the JSON property stringMatchRewardScorer



57100
57101
57102
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57100

def string_match_reward_scorer
  @string_match_reward_scorer
end

Instance Method Details

#update!(**args) ⇒ Object

Update properties of this object



57107
57108
57109
57110
57111
57112
57113
57114
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 57107

def update!(**args)
  @autorater_scorer = args[:autorater_scorer] if args.key?(:autorater_scorer)
  @cloud_run_reward_scorer = args[:cloud_run_reward_scorer] if args.key?(:cloud_run_reward_scorer)
  @code_execution_reward_scorer = args[:code_execution_reward_scorer] if args.key?(:code_execution_reward_scorer)
  @parse_response_config = args[:parse_response_config] if args.key?(:parse_response_config)
  @reward_name = args[:reward_name] if args.key?(:reward_name)
  @string_match_reward_scorer = args[:string_match_reward_scorer] if args.key?(:string_match_reward_scorer)
end