Time-Derivative Models of Pavlovian Reinforcement

Abstract

This chapter presents a model of classical conditioning called the temporaldifference (TD) model. The TD model was originally developed as a neuronlike unit for use in adaptive networks (Sutton and Barto 1987; Sutton 1984; Barto, Sutton and Anderson 1983). In this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. For an exposition of the TD model from an engineering point of view, see Chapter 13 of this volume. We focus on what we see as the primary theoretical contribution to animal learning theory of the TD and related models: the hypothesis that reinforcement in classical conditioning is the time derivative of a composite association combining innate (US) and acquired (CS) associations. We call models based on some variant of this hypothesis time-derivative models, examples of which are the models by Klopf (1988), Sutton and Barto

References

Page 1

	Year	Citations

Page 1