Publication | Closed Access
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
794
Citations
14
References
1994
Year
Mathematical ProgrammingArtificial IntelligenceEngineeringMachine LearningValue Function ApproximationQ-learning AlgorithmEducationReinforcement Learning (Educational Psychology)Learning ControlLifelong Reinforcement LearningReinforcement Learning (Computer Engineering)Approximation TheoryStochastic DynamicNew Convergence TheoremComputer ScienceMarkovian EnvironmentsMarkov Decision ProcessDeep Reinforcement LearningStochastic OptimizationDynamic Optimization
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q-learning belong.
| Year | Citations | |
|---|---|---|
Page 1
Page 1