Class: Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig

Inherits:

Object

Object
Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig

show all

Includes:: Core::Hashable, Core::JsonObjectSupport

Defined in:: lib/google/apis/aiplatform_v1beta1/classes.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb,
lib/google/apis/aiplatform_v1beta1/representations.rb

Overview

Reward function configuration with a weight. The weight is used to combine the reward with other rewards.

Instance Attribute Summary collapse

#reward_config ⇒ Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig
SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning.
#weight ⇒ Float
How much this single reward contributes to the total overall reward.

Instance Method Summary collapse

#initialize(**args) ⇒ GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig constructor
A new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.
#update!(**args) ⇒ Object
Update properties of this object.

Constructor Details

#initialize(**args) ⇒ `GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig`

Returns a new instance of GoogleCloudAiplatformV1beta1CompositeReinforcementTuningRewardConfigWeightedRewardConfig.



11281
11282
11283

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11281

def initialize(**args)
   update!(**args)
end

Instance Attribute Details

#reward_config ⇒ `Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig`

SingleReinforcementTuningRewardConfig defines a single reward function configuration for RL tuning. Each reward calculation/evaluation consists of two stages: 1. Stage 1: Parses the part of information important from sample response via regex extract, or simply takes the sample response unmodified. 2. Stage 2: Calls the configured reward scorer to compute the reward. Corresponds to the JSON property rewardConfig

Returns:

(Google::Apis::AiplatformV1beta1::GoogleCloudAiplatformV1beta1SingleReinforcementTuningRewardConfig)



11271
11272
11273

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11271

def reward_config
  @reward_config
end

#weight ⇒ `Float`

How much this single reward contributes to the total overall reward. Total reward is a linear combination of single rewards with their corresponding weights, i.e., total_reward = ( weight_a * reward_a + weight_b * reward_b + ... ) / (weight_a + weight_b + ...) Corresponds to the JSON property weight

Returns:

(Float)



11279
11280
11281

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11279

def weight
  @weight
end

Instance Method Details

#update!(**args) ⇒ `Object`

Update properties of this object

# File 'lib/google/apis/aiplatform_v1beta1/classes.rb', line 11286

def update!(**args)
  @reward_config = args[:reward_config] if args.key?(:reward_config)
  @weight = args[:weight] if args.key?(:weight)
end