Concepedia

TLDR

Value functions are central to reinforcement learning, where a single function approximator V(s;θ) estimates long‑term reward from any state. This work introduces universal value function approximators V(s,g;θ) that generalise over both states and goals, and presents an efficient supervised‑learning technique that factors values into separate state and goal embeddings. The authors show how the learned UVFA can be integrated into a reinforcement‑learning algorithm that updates the function solely from observed rewards. Experiments demonstrate that a UVFA can successfully generalise to previously unseen goals.

Abstract

Value functions are a core component of reinforcement learning systems. The main idea is to to construct a single function approximator V (s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.

References

YearCitations

Page 1