Publication | Open Access
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
156
Citations
7
References
2012
Year
Artificial IntelligenceInverse Reinforcement LearningEngineeringMachine LearningUnknown Reward FunctionNovel Gradient AlgorithmAutonomous LearningExploration V ExploitationAlgorithmic LearningSequential Decision MakingComputer ScienceIntelligent SystemsRobot LearningLearning ControlDecision TheoryMarkov Decision ProcessNatural Gradients
The mapping from parameters to policies is nonsmooth and highly redundant, posing a main difficulty. The goal is to learn a reward function whose optimal policy reproduces the expert’s observed behavior. The algorithm uses subdifferentials to handle nonsmoothness and natural gradients to address redundancy. In two artificial domains, the method proved more reliable and efficient than prior approaches.
In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1