Publication | Open Access
Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes
93
Citations
19
References
2018
Year
Mathematical ProgrammingArtificial IntelligenceEngineeringMachine LearningGaussian Process RepresentationsConstrained OptimizationIntelligent SystemsOperations ResearchData ScienceUncertainty QuantificationManagementSystems EngineeringRobot LearningCombinatorial OptimizationDecision TheoryRobust OptimizationAction Model LearningSequential Decision MakingComputer ScienceMarkov Decision ProcessExploration V ExploitationSafe ExplorationStochastic OptimizationOptimization ProblemCumulative RewardDecision ScienceSafety Function
We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1