Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping

Abstract

This paper addresses the application of Deep Reinforcement Learning (DRL) methods in the context of local navigation, i.e., a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors. Collision avoidance policies based on DRL present advantages, but they are quite susceptible to local minima, once their capacity to learn suitable actions is limited to the sensor range. We address this issue by means of reward shaping in actorcritic networks. A dense reward function, that incorporates <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">map information</i> gained in the training stage, is proposed to increase the agent's capacity to decide about the best action. Also, we offer a comparison between the Twin Delayed Deep-Deterministic Policy Gradient (TD3) andSoft Actor-Critic (SAC) algorithms for training our policy. A set of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sim-to-sim</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sim-to-real</i> trials illustrate that our proposed reward shaping outperforms the compared methods in terms of generalization, by arriving at the target at higher rates in maps that are prone to local minima and collisions.

References

Page 1

	Year	Citations

Page 1