The predictron: end-to-end learning and planning

Abstract

One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple imagined planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures.

References

Page 1

	Year	Citations
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Image ClassificationDeep Neural NetworksMachine VisionImage AnalysisMachine Learning	2016	214.9K
Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Nature Artificial IntelligenceEngineeringDeep Reinforcement LearningReinforcement Learning (Educational Psychology)Computer Science	2015	28.8K
Continuous control with deep reinforcement learning Timothy Lillicrap, Jonathan J. Hunt, Alexander Pritzel, arXiv (Cornell University) Artificial IntelligenceEngineeringMachine LearningDeep Reinforcement LearningContinuous Action Domain	2016	6.8K
MuJoCo: A physics engine for model-based control Emanuel Todorov, Tom Erez, Yuval Tassa Robot KinematicsPhysics EngineEngineeringAdvanced Motion ControlComputational Mechanics	2012	4.3K
Learning to Predict by the Methods of Temporal Differences Richard S. Sutton Machine Learning EngineeringMachine LearningData ScienceTemporal DifferencesPredictive Analytics	1988	3.9K
Deeply-Supervised Nets Chen‐Yu Lee, Saining Xie, Patrick W. Gallagher,	2015	854
Universal Value Function Approximators Tom Schaul, Daniel Horgan, Karol Gregor, Numerical AnalysisArtificial IntelligenceEngineeringMachine LearningSeparate Embedding Vectors	2015	612
Action-Conditional Video Prediction using Deep Networks in Atari Games Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, arXiv (Cornell University) Action-conditional Video PredictionDeep Partial ObservabilityMachine VisionMachine LearningEngineering	2015	446
Predictive Representations of State Michael L. Littman, Richard S. Sutton	2001	426
Adaptive Computation Time for Recurrent Neural Networks Alex Graves arXiv (Cornell University) Structured PredictionEngineeringMachine LearningParameter GradientsComputational Complexity	2016	360

Page 1