Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

TLDR

Reinforcement learning promises autonomous robots to acquire diverse skills with minimal human input, yet practical robotic applications often rely on hand‑engineered policies and demonstrations to reduce training time, and deep RL has been largely confined to simulation or simple tasks because of its high sample complexity. The study shows that an off‑policy deep Q‑function algorithm can scale to complex 3D manipulation tasks and train deep neural network policies efficiently on real robots. The algorithm trains deep Q‑functions off‑policy and further shortens training time by asynchronously pooling policy updates across multiple robots. Experiments demonstrate that the method learns diverse 3D manipulation skills in simulation and a complex door‑opening task on real robots without prior demonstrations or hand‑crafted representations.

Abstract

Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.

References

Page 1

	Year	Citations
Adam: A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba UvA-DARE (University of Amsterdam) Artificial IntelligenceMathematical ProgrammingModel OptimizationMachine VisionMachine Learning	2014	84.5K
Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Nature Artificial IntelligenceEngineeringDeep Reinforcement LearningReinforcement Learning (Educational Psychology)Computer Science	2015	28.8K
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe, Christian Szegedy arXiv (Cornell University) Data AugmentationDeep Neural NetworksMachine VisionMachine LearningData Science	2015	24.2K
Q-learning Christopher J. Watkins, Peter Dayan Machine Learning	1992	8.9K
Simple statistical gradient-following algorithms for connectionist reinforcement learning Ronald J. Williams Machine Learning Artificial IntelligenceEngineeringMachine LearningSequential LearningConnectionist Reinforcement Learning	1992	7.4K
Continuous control with deep reinforcement learning Timothy Lillicrap, Jonathan J. Hunt, Alexander Pritzel, arXiv (Cornell University) Artificial IntelligenceEngineeringMachine LearningDeep Reinforcement LearningContinuous Action Domain	2016	6.8K
Policy Gradient Methods for Reinforcement Learning with Function Approximation Richard S. Sutton, David McAllester, Satinder Singh,	1999	5K
MuJoCo: A physics engine for model-based control Emanuel Todorov, Tom Erez, Yuval Tassa Robot KinematicsPhysics EngineEngineeringAdvanced Motion ControlComputational Mechanics	2012	4.3K
Trust Region Policy Optimization John Schulman, Sergey Levine, Michael I. Jordan arXiv (Cornell University) Artificial IntelligenceReward HackingEngineeringMachine LearningMonotonic Improvement	2015	3.1K
Reinforcement learning in robotics: A survey Jens Kober, J. Andrew Bagnell, Jan Peters The International Journal of Robotics Research Artificial IntelligenceEngineeringReinforcement Learning ApproachesRobotic AgentAutomation	2013	3K

Page 1