Point-Based Value Iteration for Continuous POMDPs

TLDR

Most POMDP algorithms are limited to discrete spaces, yet many real‑world problems such as robot navigation require continuous state, action, and observation spaces. The paper proposes a novel method to optimize continuous‑space POMDPs. The method extends discrete POMDP algorithms to continuous states by modeling dynamics and rewards with Gaussian mixtures, representing beliefs with Gaussian mixtures or particle sets, and introduces sampling strategies for continuous actions and observations. The authors show that continuous POMDP value functions are convex (piecewise‑linear convex under discrete observations/actions), that continuous Bellman backups are contracting and isotonic guaranteeing convergence, and that using Gaussian mixtures allows closed‑form integrals, making the algorithm computationally feasible.

Abstract

We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of value-iteration algorithms. Relying on those properties, we extend the algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend to deal with continuous action and observation sets by designing effective sampling approaches.

References

Page 1

	Year	Citations

Page 1