Transfer in Deep Reinforcement Learning Using Successor Features and\n Generalised Policy Improvement

Abstract

The ability to transfer skills across tasks has the potential to scale up\nreinforcement learning (RL) agents to environments currently out of reach.\nRecently, a framework based on two ideas, successor features (SFs) and\ngeneralised policy improvement (GPI), has been introduced as a principled way\nof transferring skills. In this paper we extend the SFs & GPI framework in two\nways. One of the basic assumptions underlying the original formulation of SFs &\nGPI is that rewards for all tasks of interest can be computed as linear\ncombinations of a fixed set of features. We relax this constraint and show that\nthe theoretical guarantees supporting the framework can be extended to any set\nof tasks that only differ in the reward function. Our second contribution is to\nshow that one can use the reward functions themselves as features for future\ntasks, without any loss of expressiveness, thus removing the need to specify a\nset of features beforehand. This makes it possible to combine SFs & GPI with\ndeep learning in a more stable way. We empirically verify this claim on a\ncomplex 3D environment where observations are images from a first-person\nperspective. We show that the transfer promoted by SFs & GPI leads to very good\npolicies on unseen tasks almost instantaneously. We also describe how to learn\npolicies specialised to the new tasks in a way that allows them to be added to\nthe agent's set of skills, and thus be reused in the future.\n

References

Page 1

	Year	Citations

Page 1