Publication | Open Access
Distributional Reinforcement Learning With Quantile Regression
483
Citations
28
References
2018
Year
Artificial IntelligenceEngineeringMachine LearningGame TheoryValue Function ApproximationMulti-agent LearningLearning ControlApproximate DistributionData ScienceStochastic GameUncertainty QuantificationRobot LearningStatisticsDistributional Reinforcement LearningSequential Decision MakingComputer ScienceGamesNew AlgorithmExploration V ExploitationStatistical Inference
Reinforcement learning agents interact with environments, and the stochastic nature of state transitions, actions, and rewards induces randomness in long‑term returns, which traditional algorithms address by averaging over this uncertainty to estimate value functions. This work proposes to explicitly model the distribution over returns rather than just its mean, thereby extending distributional reinforcement learning. The authors extend theoretical results to the approximate distribution setting and introduce a new distributional RL algorithm that aligns with this formulation. The algorithm closes gaps between theory and practice, and on Atari 2600 games it significantly outperforms DQN and the related C51 algorithm.
In reinforcement learning (RL), an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation. Finally, we evaluate this new algorithm on the Atari 2600 games, observing that it significantly outperforms many of the recent improvements on DQN, including the related distributional algorithm C51.
| Year | Citations | |
|---|---|---|
Page 1
Page 1