Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
- Inherits:
-
Object
- Object
- Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
- Includes:
- Core::Hashable, Core::JsonObjectSupport
- Defined in:
- lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb
Overview
Reward function configuration with a weight. The weight is used to combine the reward with other rewards. The weight can be overridden at the data source level with dataset specific weights, which is a map from reward_name to reward_weight.
Instance Attribute Summary collapse
-
#reward_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning.
-
#weight ⇒ Float
How much this single reward contributes to the total overall reward.
Instance Method Summary collapse
-
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
constructor
A new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.
-
#update!(**args) ⇒ Object
Update properties of this object.
Constructor Details
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
Returns a new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.
10341 10342 10343 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10341 def initialize(**args) update!(**args) end |
Instance Attribute Details
#reward_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
SingleReinforcementTuningRewardConfig defines a single reward function
configuration for RL tuning. Each reward calculation/evaluation consists of
two stages: stage 1: parse the part of information important from sample
response via regex extract or simply take the sample response unmodified.
stage 2: Call specific reward scorer to compute the reward and also output
whether the sample answer is correct. While wrong answer and correct answer
should get assigned different rewards, correct answers could also get assigned
different rewards.
Corresponds to the JSON property rewardConfig
10327 10328 10329 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10327 def reward_config @reward_config end |
#weight ⇒ Float
How much this single reward contributes to the total overall reward. Total
reward is a linear combination of single rewards with their corresponding
weights. I.e., Total reward = ( reward_weight_of_reward_a * reward of
reward_a + reward_weight_of_reward_b * reward of reward_b + ... )/(sum of
reward_weights) This reward weight represents the default weighting that
will be used to sum up different rewards. This weighting can be overridden at
the data source level with dataset specific weights, which is a map from
reward_name to reward_weight. Consider setting this to 1.
Corresponds to the JSON property weight
10339 10340 10341 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10339 def weight @weight end |
Instance Method Details
#update!(**args) ⇒ Object
Update properties of this object
10346 10347 10348 10349 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 10346 def update!(**args) @reward_config = args[:reward_config] if args.key?(:reward_config) @weight = args[:weight] if args.key?(:weight) end |