Generalization in Reinforcement Learning: Safely Approximating the Value Function

Abstract

A straightforward approach to the curse of dimensionality inreinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. Wethenintroduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the bene ts of successful generalization. 1

References

Page 1

	Year	Citations

Page 1