Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig

Inherits:

Object

Object
Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig

show all

Includes:: Core::Hashable, Core::JsonObjectSupport

Defined in:: lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

Reward function configuration with a weight. The weight is used to combine the reward with other rewards. The weight can be overridden at the data source level with dataset specific weights, which is a map from reward_name to reward_weight.

Instance Attribute Summary collapse

#reward_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning.
#weight ⇒ Float
How much this single reward contributes to the total overall reward.

Instance Method Summary collapse

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig constructor
A new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.
#update!(**args) ⇒ Object
Update properties of this object.

Constructor Details

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig`

Returns a new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.



10341
10342
10343

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10341

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#reward_config ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig`

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: stage 1: parse the part of information important from sample response via regex extract or simply take the sample response unmodified. stage 2: Call specific reward scorer to compute the reward and also output whether the sample answer is correct. While wrong answer and correct answer should get assigned different rewards, correct answers could also get assigned different rewards. Corresponds to the JSON property rewardConfig

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig)



10327
10328
10329

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10327

def reward_config
  @reward_config
end

#weight ⇒ `Float`

How much this single reward contributes to the total overall reward. Total reward is a linear combination of single rewards with their corresponding weights. I.e., Total reward = ( reward_weight_of_reward_a * reward of reward_a + reward_weight_of_reward_b * reward of reward_b + ... )/(sum of reward_weights) This reward weight represents the default weighting that will be used to sum up different rewards. This weighting can be overridden at the data source level with dataset specific weights, which is a map from reward_name to reward_weight. Consider setting this to 1. Corresponds to the JSON property weight

Returns:

(Float)



10339
10340
10341

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10339

def weight
  @weight
end

Instance Method Details

#update!(**args) ⇒ `Object`

Update properties of this object

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10346

def update!(**args)
  @reward_config = args[:reward_config] if args.key?(:reward_config)
  @weight = args[:weight] if args.key?(:weight)
end