Reinforcement Learning with Unsupervised Auxiliary Tasks

TLDR

Deep reinforcement learning has achieved state‑of‑the‑art performance by maximizing cumulative reward, yet environments offer many other training signals. The paper proposes an agent that simultaneously maximizes multiple pseudo‑reward functions via reinforcement learning. The agent uses a shared representation that evolves without extrinsic rewards and a novel focus mechanism that aligns it with extrinsic rewards for rapid adaptation. The agent outperforms prior state‑of‑the‑art on Atari (880 % of expert human performance) and on Labyrinth (87 % of expert human performance with a 10× speed‑up).

Abstract

Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervised learning, continues to develop in the absence of extrinsic rewards. We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task. Our agent significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% expert human performance on Labyrinth.