Incremental least-squares temporal difference learning

Abstract

Approximate policy evaluation with linear function approx-imation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algo-rithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more computationally ef-ficient than non-incremental least-squares TD methods such as LSTD (Bradtke &amp; Barto 1996; Boyan 1999). In particular, we show that the per-time-step complexities of iLSTD and TD(0) are O(n), where n is the number of features, whereas that of LSTD is O(n2). This difference can be decisive in modern applications of reinforcement learning where the use of a large number features has proven to be an effective so-lution strategy. We present empirical comparisons, using the test problem introduced by Boyan (1999), in which iLSTD converges faster than TD(0) and almost as fast as LSTD.

References

Page 1

	Year	Citations

Page 1