Probabilistic inference for solving discrete and continuous state Markov Decision Processes

TLDR

Inference in Markov Decision Processes is increasingly used for goal inference, policy recognition, and policy computation, and any existing inference technique in dynamic Bayesian networks can be applied to behavioral questions across continuous, factorial, or hierarchical state representations. We present an Expectation Maximization algorithm for computing optimal policies in Markov Decision Processes. The algorithm is generic, allowing any inference technique to be employed in its E‑step. The method optimizes discounted expected future return for arbitrary reward functions without assuming a finite horizon, as demonstrated on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.

Abstract

Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral question--including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. The algorithm is generic in that any inference technique can be utilized in the E-step. We demonstrate this for exact inference on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.

References

Page 1

	Year	Citations

Page 1