A comprehensive survey on safe reinforcement learning

TLDR

Safe Reinforcement Learning is the process of learning policies that maximize expected return while ensuring reasonable system performance and respecting safety constraints during learning and deployment. This survey categorizes and analyzes two Safe Reinforcement Learning approaches and uses the classification to review existing literature and propose future research directions. The first approach modifies the optimality criterion by incorporating a safety factor into the classic discounted horizon, while the second alters exploration by integrating external knowledge or a risk metric.

Abstract

Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The second is based on the modification of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classification to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.

References

Page 1

	Year	Citations
Reinforcement Learning: An Introduction IEEE Transactions on Neural Networks Artificial IntelligenceEngineeringDeep Reinforcement LearningStochastic GameGame Theory	2005	25.7K
Markov Decision Processes: Discrete Stochastic Dynamic Programming. Kasra Hazeghi, Martin L. Puterman Journal of the American Statistical Association Markov Decision ProcessEngineeringStochastic GameUncertainty QuantificationStochastic Processes	1995	8.4K
Learning from delayed rewards Chris Watkins OpenGrey (Institut de l'Information Scientifique et Technique) Artificial IntelligenceEngineeringMachine LearningStochastic GameGame Theory	1989	5.5K
A survey of robot learning from demonstration Brenna Argall, Sonia Chernova, Manuela Veloso, Robotics and Autonomous Systems Artificial IntelligenceEngineeringMachine LearningIntelligent RoboticsAction Model Learning	2008	3.2K
Self-improving reactive agents based on reinforcement learning, planning and teaching Long-Ji Lin Machine Learning Artificial IntelligenceEngineeringReinforcement Learning (Computer Engineering)Agent Decision-makingAutonomous Learning	1992	1.6K
Operations Research -- An Introduction Antony Unwin, H.A. Taha Journal of the Operational Research Society	1980	1.6K
Reinforcement Learning: Kybernetes	1998	1.6K
Transfer Learning for Reinforcement Learning Domains: A Survey Matthew E. Taylor, Peter Stone Journal of Machine Learning Research Artificial IntelligenceCognitive ScienceEngineeringMachine LearningLimited Environmental Feedback	2009	1.6K
Near-Optimal Reinforcement Learning in Polynomial Time Michael Kearns, Satinder Singh Machine Learning	2002	849
Robust Control of Markov Decision Processes with Uncertain Transition Matrices Arnab Nilim, Laurent El Ghaoui Operations Research EngineeringRobust ControlAutonomous SystemsMarkov Decision ProblemsStochastic Hybrid System	2005	680

Page 1