Learning agile and dynamic motor skills for legged robots

TLDR

Legged robots present major challenges, and while dynamic animal‑like maneuvers are difficult to replicate with hand‑crafted methods, reinforcement learning offers a promising, low‑craftsmanship alternative that has largely remained confined to simulation due to the high cost of real‑robot training. This study introduces a method to train a neural‑network policy in simulation and transfer it to a state‑of‑the‑art legged robot. The approach is applied to the ANYmal quadruped, leveraging fast, automated, cost‑effective data generation to train and transfer the policy. Policies trained in simulation enable ANYmal to precisely and energy‑efficiently follow high‑level velocity commands, run faster than before, and recover from falls in complex configurations, surpassing prior methods.

Abstract

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog-sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

References

Page 1

	Year	Citations

Page 1