Exploiting Structure in Policy Construction

TLDR

Markov decision processes have been applied to decision‑theoretic planning, yet traditional solution methods are often impractical for large AI planning problems. This work introduces structured policy Iteration (SPI), an algorithm that constructs optimal policies without explicit state‑space enumeration. SPI retains the core steps of modified policy iteration while exploiting variable and prepositional independencies encoded in a temporal Bayesian network, and its principles apply to any structured representation of stochastic actions, policies, and value functions, allowing integration with recent approximation methods.

Abstract

Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy Iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and prepositional independencies reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, policies and value functions, and the algorithm itself can be used in conjunction with recent approximation methods.

References

Page 1

	Year	Citations

Page 1