Policy gradient reinforcement learning for fast quadrupedal locomotion

Abstract

This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot. After about three hours of learning, all on the physical robots and with no human intervention other than to change the batteries, the robots achieved a gait faster than any previously known gait known for the Aibo, significantly outperforming a variety of existing hand-coded and learned solutions.

References

Page 1

	Year	Citations
Numerical recipes in C: the art of scientific computing Choice Reviews Online Numerical AnalysisSpectral TheorySecond EditionNumerical ComputationEngineering	1993	18K
Learning from delayed rewards Chris Watkins OpenGrey (Institut de l'Information Scientifique et Technique) Artificial IntelligenceEngineeringMachine LearningStochastic GameGame Theory	1989	5.5K
Policy Gradient Methods for Reinforcement Learning with Function Approximation Richard S. Sutton, David McAllester, Satinder Singh,	1999	5K
Numerical Recipes--The Art of Scientific Computing. Frederick N. Fritsch, William H. Press, Brian P. Flannery, Mathematics of Computation Numerical AnalysisComputational ScienceNumerical ComputationEngineeringNumerical Recipes	1988	4.8K
Numerical Recipes in C: The Art of Scientific Computing Mary C. Seiler, Fritz A. Seiler Risk Analysis Numerical AnalysisNumerical ComputationEngineeringNumerical RecipesValidated Numerics	1989	1.3K
Infinite-Horizon Policy-Gradient Estimation Journal of Artificial Intelligence Research Artificial IntelligenceEngineeringMachine LearningGame TheoryInfinite-horizon Policy-gradient Estimation	2001	328
Autonomous Helicopter Flight via Reinforcement Learning H. J. Kim, Michael I. Jordan, S. Shankar Sastry, Neural Information Processing Systems Artificial IntelligenceAutonomous Helicopter FlightEngineeringAerospace EngineeringIntelligent Control	2003	283
Autonomous helicopter control using reinforcement learning policy search methods J. Andrew Bagnell, Jeff Schneider Artificial IntelligenceEngineeringMachine LearningMarkovian ModelsAerospace Engineering	2002	277
Automated gait adaptation for legged robots J.D. Weingarten, Gabriel A. D. Lopes, M. Buehler, Gait AnalysisEngineeringMechanical EngineeringField RoboticsMotor Control	2004	140
Evolving robust gaits with AIBO Gregory S. Hornby, S. Takamura, Jun Yokono, Artificial IntelligenceEngineeringEducationIntelligent SystemsEntertainment Robot	2002	125

Page 1