AWS DeepRacer 奖励函数示例
AWS DeepRacer 奖励函数示例
以下列出了 AWS DeepRacer 奖励函数的一些示例。
主题
- 示例 1:在计时赛中遵循中心线
- 示例 2:在计时赛中待在两个边界内
- 示例 3:防止计时赛出现曲折现象
- 示例 4:在不撞上静止的障碍物或行驶中的车辆的情况下保持在一条车道上
示例 1:在计时赛中遵循中心线
此示例确定代理距中心线的距离,如果代理靠近赛道的中心,则提供更高的奖励,鼓励代理紧贴中心线行驶。
def reward_function(params):
'''
Example of rewarding the agent to follow center line
'''
# Read input parameters
track_width = params['track_width']
distance_from_center = params['distance_from_center']
# Calculate 3 markers that are increasingly further away from the center line
marker_1 = 0.1 * track_width
marker_2 = 0.25 * track_width
marker_3 = 0.5 * track_width
# Give higher reward if the car is closer to center line and vice versa
if distance_from_center <= marker_1:
reward = 1
elif distance_from_center <= marker_2:
reward = 0.5
elif distance_from_center <= marker_3:
reward = 0.1
else:
reward = 1e-3 # likely crashed/ close to off track
return reward
示例 2:在计时赛中待在两个边界内
如果代理人待在边界内,这个例子只会给出高额奖励,让特工找出完成一圈的最佳路径。它易于编程和理解,但可能需要更长的时间才能融合。
def reward_function(params):
'''
Example of rewarding the agent to stay inside the two borders of the track
'''
# Read input parameters
all_wheels_on_track = params['all_wheels_on_track']
distance_from_center = params['distance_from_center']
track_width = params['track_width']
# Give a very low reward by default
reward = 1e-3
# Give a high reward if no wheels go off the track and
# the car is somewhere in between the track borders
if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
reward = 1.0
# Always return a float value
return reward
示例 3:防止计时赛出现曲折现象
此示例奖励代理紧贴中心线行驶,但如果转向角太大,则会受到惩罚(减少奖励),这有助于防止之字形行驶。特工在模拟器中学会平稳驾驶,部署到实体车辆时可能会保持相同的行为。
def reward_function(params):
'''
Example of penalize steering, which helps mitigate zig-zag behaviors
'''
# Read input parameters
distance_from_center = params['distance_from_center']
track_width = params['track_width']
abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle
# Calculate 3 marks that are farther and father away from the center line
marker_1 = 0.1 * track_width
marker_2 = 0.25 * track_width
marker_3 = 0.5 * track_width
# Give higher reward if the car is closer to center line and vice versa
if distance_from_center <= marker_1:
reward = 1.0
elif distance_from_center <= marker_2:
reward = 0.5
elif distance_from_center <= marker_3:
reward = 0.1
else:
reward = 1e-3 # likely crashed/ close to off track
# Steering penality threshold, change the number based on your action space setting
ABS_STEERING_THRESHOLD = 15
# Penalize reward if the car is steering too much
if abs_steering > ABS_STEERING_THRESHOLD:
reward *= 0.8
return float(reward)
示例 4:在不撞上静止的障碍物或行驶中的车辆的情况下保持在一条车道上
此奖励功能奖励特工停留在赛道边界内,并惩罚特工离赛道前方物体太近。代理可变道以避免撞车。总奖励是奖励和惩罚的加权总和。为了避免碰撞,该示例更加重视处罚。尝试不同的平均权重,针对不同的行为结果进行训练。
import math
def reward_function(params):
'''
Example of rewarding the agent to stay inside two borders
and penalizing getting too close to the objects in front
'''
all_wheels_on_track = params['all_wheels_on_track']
distance_from_center = params['distance_from_center']
track_width = params['track_width']
objects_location = params['objects_location']
agent_x = params['x']
agent_y = params['y']
_, next_object_index = params['closest_objects']
objects_left_of_center = params['objects_left_of_center']
is_left_of_center = params['is_left_of_center']
# Initialize reward with a small number but not zero
# because zero means off-track or crashed
reward = 1e-3
# Reward if the agent stays inside the two borders of the track
if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
reward_lane = 1.0
else:
reward_lane = 1e-3
# Penalize if the agent is too close to the next object
reward_avoid = 1.0
# Distance to the next object
next_object_loc = objects_location[next_object_index]
distance_closest_object = math.sqrt((agent_x - next_object_loc[0])**2 + (agent_y - next_object_loc[1])**2)
# Decide if the agent and the next object is on the same lane
is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center
if is_same_lane:
if 0.5 <= distance_closest_object < 0.8:
reward_avoid *= 0.5
elif 0.3 <= distance_closest_object < 0.5:
reward_avoid *= 0.2
elif distance_closest_object < 0.3:
reward_avoid = 1e-3 # Likely crashed
# Calculate reward by putting different weights on
# the two aspects above
reward += 1.0 * reward_lane + 4.0 * reward_avoid
return reward