Concepedia

TLDR

Dynamic programming over Markov chains has been studied by many researchers, and this paper extends those results to Markov‑renewal processes where transition times are state‑dependent random intervals. The authors formulate a decision problem in which actions are chosen at each transition to maximize the total expected reward over a finite or infinite planning horizon. They present two parts: the first details the properties of Markov‑renewal processes, reward structure, and decision rules, while the second develops finite‑ and infinite‑horizon algorithms with discounting and explores infinite‑return models.

Abstract

A special structure in dynamic programming which has been studied by Bellman, Blackwell, D'Épenoux, Derman, Howard, Manne, Oliver, Wolfe and Dantzig, and others is the problem of programming over a Markov chain This paper extends their results and solution algorithms to programming over a Markov-renewal process—in which the intervals between transitions of the system from state i to state j are independent samples from a distribution that may depend on both i and j. For these processes, a general reward structure and a decision mechanism are postulated, the problem is to make decisions at each transition to maximize the total expected reward at the end of the planning horizon. The paper is divided into two parts. This part describes the properties of Markov-renewal processes, the reward structure, and the decision process. Algorithms for finite-horizon, and infinite-horizon models with discounting are presented. The second part will investigate the models that have infinite return.