The Curse of Planning

TLDR

Studies suggest that choice behavior is governed by parallel, competing valuation systems—flexible but computationally costly model-based reinforcement learning versus efficient model-free reinforcement learning—but the determinants of which system dominates remain unclear. The authors tested this by having participants perform a demanding secondary task while making decisions, which increased reliance on model-free reinforcement learning. They found that people dynamically adjust the balance between model-based and model-free strategies depending on executive-function demands, with choice latencies reflecting computational costs, and that this competition can be modulated trial‑by‑trial by altering cognitive resource availability.

Abstract

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people’s choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

References

Page 1

	Year	Citations

Page 1