Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig

Inherits:
Object
  • Object
show all
Includes:
Core::Hashable, Core::JsonObjectSupport
Defined in:
lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

Reward function configuration with a weight. The weight is used to combine the reward with other rewards. The weight can be overridden at the data source level with dataset specific weights, which is a map from reward_name to reward_weight.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig

Returns a new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.



10341
10342
10343
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10341

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#reward_configGoogle::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: stage 1: parse the part of information important from sample response via regex extract or simply take the sample response unmodified. stage 2: Call specific reward scorer to compute the reward and also output whether the sample answer is correct. While wrong answer and correct answer should get assigned different rewards, correct answers could also get assigned different rewards. Corresponds to the JSON property rewardConfig



10327
10328
10329
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10327

def reward_config
  @reward_config
end

#weightFloat

How much this single reward contributes to the total overall reward. Total reward is a linear combination of single rewards with their corresponding weights. I.e., Total reward = ( reward_weight_of_reward_a * reward of reward_a + reward_weight_of_reward_b * reward of reward_b + ... )/(sum of reward_weights) This reward weight represents the default weighting that will be used to sum up different rewards. This weighting can be overridden at the data source level with dataset specific weights, which is a map from reward_name to reward_weight. Consider setting this to 1. Corresponds to the JSON property weight

Returns:

  • (Float)


10339
10340
10341
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10339

def weight
  @weight
end

Instance Method Details

#update!(**args) ⇒ Object

Update properties of this object



10346
10347
10348
10349
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10346

def update!(**args)
  @reward_config = args[:reward_config] if args.key?(:reward_config)
  @weight = args[:weight] if args.key?(:weight)
end