Model-based imitation learning by probabilistic trajectory matching

TLDR

Teaching robots by imitating demonstrations is elegant but challenging due to differences in robot and teacher anatomy and reduced robustness to task changes. The paper proposes an imitation‑learning method that learns probabilistic forward models to efficiently acquire tasks from expert demonstrations. The method directly learns policies by modeling teacher and robot trajectories as probability distributions and minimizing their Kullback–Leibler divergence, then compares this probabilistic forward‑model approach to model‑based reinforcement learning with hand‑crafted costs and evaluates it on a real compliant robot.

Abstract

One of the most elegant ways of teaching new skills to robots is to provide demonstrations of a task and let the robot imitate this behavior. Such imitation learning is a non-trivial task: Different anatomies of robot and teacher, and reduced robustness towards changes in the control task are two major difficulties in imitation learning. We present an imitation-learning approach to efficiently learn a task from expert demonstrations. Instead of finding policies indirectly, either via state-action mappings (behavioral cloning), or cost function learning (inverse reinforcement learning), our goal is to find policies directly such that predicted trajectories match observed ones. To achieve this aim, we model the trajectory of the teacher and the predicted robot trajectory by means of probability distributions. We match these distributions by minimizing their Kullback-Leibler divergence. In this paper, we propose to learn probabilistic forward models to compute a probability distribution over trajectories. We compare our approach to model-based reinforcement learning methods with hand-crafted cost functions. Finally, we evaluate our method with experiments on a real compliant robot.

References

Page 1

	Year	Citations

Page 1