Exact and approximate algorithms for partially observable markov decision processes

TLDR

Automated sequential decision making is essential, but uncertainty makes computing optimal policies more complex, especially as the number of uncertain sources increases. The study aims to develop and evaluate exact and approximate algorithms for partially observable Markov decision processes (POMDPs) to compute optimal and sub‑optimal policies. The authors employ POMDP models, refine exact algorithms, and investigate approximate policy methods, empirically comparing their performance on diverse problems. The results show that the proposed exact algorithm improvements outperform previous methods, and that the examined approximate techniques achieve competitive sub‑optimal performance across benchmark problems.

Abstract

Automated sequential decision making is crucial in many contexts. In the face of uncertainty, this task becomes even more important, though at the same time, computing optimal decision policies becomes more complex. The more sources of uncertainty there are, the harder the problem becomes to solve. In this work, we look at sequential decision making in environments where the actions have probabilistic outcomes and in which the system state is only partially observable. We focus on using a model called a partially observable Markov decision process (POMDP) and explore algorithms which address computing both optimal and approximate policies for use in controlling processes that are modeled using POMDPs.\r\nAlthough solving for the optimal policy is PSPACE-complete (or worse), the study and improvements of exact algorithms lends insight into the optimal solution structure as well as providing a basis for approximate solutions. We present some improvements, analysis and empirical comparisons for some existing and some novel approaches for computing the optimal POMDP policy exactly. Since it is also hard (NP-complete or worse) to derive close approximations to the optimal solution for POMDPs, we consider a number of approaches for deriving policies that yield sub-optimal control and empirically explore their performance on a range of problems. These approaches borrow and extend ideas from a number of areas; from the more mathematically motivated techniques in reinforcement learning and control theory to entirely heuristic control rules.