Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

TLDR

Semi‑Markov Decision Problems generalize discrete‑time MDPs to continuous time, and recent reinforcement learning algorithms such as TD(λ), Q‑learning, and Real‑time Dynamic Programming have been developed for MDPs. The authors propose algorithms analogous to TD(λ), Q‑learning, and Real‑time Dynamic Programming, adapted for solving semi‑Markov Decision Problems. They demonstrate the algorithms by applying them to determine optimal control in a simple queueing system. The study discusses conditions under which the proposed algorithms can be effectively applied.

Abstract

Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(λ), Q-learning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied.

References

Page 1

	Year	Citations

Page 1