Dota 2 with Large Scale Deep Reinforcement Learning

TLDR

Dota 2 poses long‑horizon, imperfect‑information, and complex continuous state‑action challenges that are increasingly central to advanced AI systems. OpenAI Five trained via self‑play reinforcement learning on a distributed system, processing about 2 million frames every two seconds and running for ten months. By defeating the world‑champion team OG, OpenAI Five demonstrates that self‑play reinforcement learning can achieve superhuman performance in a complex esports game.

Abstract

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

References

Page 1

	Year	Citations

Page 1