Publication | Open Access
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
1.1K
Citations
20
References
2011
Year
The paper introduces PILCO, a practical, data‑efficient model‑based policy search method. PILCO learns a probabilistic dynamics model, incorporates model uncertainty into long‑term planning, evaluates policies in closed form, and computes analytic policy gradients. PILCO reduces model bias in a principled way and achieves unprecedented learning efficiency on challenging, high‑dimensional control tasks.
In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1