Publication | Closed Access
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
302
Citations
19
References
2013
Year
Unknown Venue
Interactive Reinforcement Learning aims to use nonexpert human feedback to solve complex tasks, with current methods mapping feedback to rewards and values to iteratively improve policies. The paper proposes Policy Shaping as a more effective way to characterize human feedback. Advise is a Bayesian method that uses human feedback directly as policy labels to maximize information gain. Advise outperforms state‑of‑the‑art methods and remains robust to infrequent and inconsistent human feedback.
A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.
| Year | Citations | |
|---|---|---|
Page 1
Page 1