Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
- Inherits:
-
Object
- Object
- Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
- Includes:
- Core::Hashable, Core::JsonObjectSupport
- Defined in:
- lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb
Overview
Reward function configuration with a weight. The weight is used to combine the reward with other rewards.
Instance Attribute Summary collapse
-
#reward_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning.
-
#weight ⇒ Float
How much this single reward contributes to the total overall reward.
Instance Method Summary collapse
-
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
constructor
A new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.
-
#update!(**args) ⇒ Object
Update properties of this object.
Constructor Details
#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig
Returns a new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.
11281 11282 11283 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11281 def initialize(**args) update!(**args) end |
Instance Attribute Details
#reward_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
SingleReinforcementTuningRewardConfig defines a single reward function
configuration for RL tuning. Each reward calculation/evaluation consists of
two stages: 1. Stage 1: Parses the part of information important from sample
response via regex extract, or simply takes the sample response unmodified. 2.
Stage 2: Calls the configured reward scorer to compute the reward.
Corresponds to the JSON property rewardConfig
11271 11272 11273 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11271 def reward_config @reward_config end |
#weight ⇒ Float
How much this single reward contributes to the total overall reward. Total
reward is a linear combination of single rewards with their corresponding
weights, i.e., total_reward = ( weight_a * reward_a + weight_b * reward_b +
... ) / (weight_a + weight_b + ...)
Corresponds to the JSON property weight
11279 11280 11281 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11279 def weight @weight end |
Instance Method Details
#update!(**args) ⇒ Object
Update properties of this object
11286 11287 11288 11289 |
# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11286 def update!(**args) @reward_config = args[:reward_config] if args.key?(:reward_config) @weight = args[:weight] if args.key?(:weight) end |