Learning Safe Policies via Primal-Dual Methods

Abstract

In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experiments. We define safety as the agent remaining in a desired safe set with high probability for every time instance. We therefore consider a constrained MDP where the constraints are probabilistic. Due to the difficulty of addressing these constraints in a reinforcement learning framework, we propose an ergodic relaxation of the problem. Nonetheless, this relaxation is such that we are able to provide safety guarantees on the resulting policies. To compute these policies, we resource to a stochastic primal-dual method. We test the proposed approach in a navigation task in a grid world. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.

References

Page 1

	Year	Citations

Page 1