Publication | Closed Access
One and Done? Optimal Decisions from Very Few Samples
72
Citations
0
References
2009
Year
Bayesian Decision TheoryBehavioral Decision MakingCognitionBayesian InferenceSocial SciencesInductive InferencePosterior DistributionManagementBayesian MethodsBayesian BehaviorCognitive Bias MitigationCognitive AnalysisChoice-process DataDecision TheoryStatisticsCognitive ScienceBehavioral SciencesStatistical ThinkingOptimal DecisionsSampling (Statistics)Educational TestingExperimental PsychologyBayesian StatisticsCognitive DynamicsStatistical InferenceDecision ScienceNew Test Items
One and Done? Optimal Decisions From Very Few Samples Edward Vul (evul@mit.edu) Noah D. Goodman (ndg@mit.edu) Brain and Cognitive Science, 43 Vassar St Cambridge, MA 02139 USA Brain and Cognitive Science, 43 Vassar St Cambridge, MA 02139 USA Thomas L. Griffiths (tom griffiths@berkeley.edu) Joshua B. Tenenbaum (jbt@mit.edu) Dept. of Psychology, Tolman Hall Berkeley, CA 94720 USA Brain and Cognitive Science, 43 Vassar St Cambridge, MA 02139 USA Abstract learned rules to new test items. After exposure to several cat- egory exemplars people classify new test items consistently with fully Bayesian inference, on average. This average be- havior suggests that people consider many possible classifica- tion rules, update their beliefs about each one, and then clas- sify new items by averaging the classification over all the pos- sible rules. However, this perfectly Bayesian behavior is only evident in the average across many observers. In contrast, each individual classifies all test items in a manner consis- tent with only one or a few rules; which rules are considered varies from observer to observer according to the appropri- ate posterior probabilities (Goodman et al., 2008). Thus, it seems that an individual observer acts based on just one or a few rules sampled from the posterior distribution, and the fully Bayesian behavior only emerges when averaging many individuals, each with different sampled rules. This sampling behavior is not limited to concept-learning tasks. In many other high-level cognitive tasks, individuals’ patterns of response – and sometimes even responses on indi- vidual trials – appear to reflect just a small number of samples from the posterior predictive distribution. When predicting how long a cake will bake given that it has been in the oven for 45 minutes (Griffiths & Tenenbaum, 2006), the across-subject variance of responses is consistent with individuals guessing based on only two prior observations of cake baking times (Mozer, Pashler, & Homaei, 2008). When making estimates of esoteric quantities in the world, multiple guesses from one individual have independent error, like samples from a proba- bility distribution (Vul & Pashler, 2008). In all of these cases (and others; e.g., Xu & Tenenbaum, 2007; Anderson, 1991; Sanborn & Griffiths, 2008), people seem to sample instead of computing the “fully Bayesian” answer. Critics of the Bayesian approach (e.g., Mozer et al., 2008) have suggested that although many samples may adequately approximate Bayesian inference, behavior based on only a few samples is fundamentally inconsistent with the hypoth- esis that human cognition is Bayesian. Others highlight the second problem and argue that cognition cannot be Bayesian inference because exact Bayesian calculations are computa- tionally intractable (e.g., Gigerenzer, 2008). In this paper we will argue that acting based on a few sam- ples can be easily reconciled with optimal Bayesian infer- ence and may be the method by which people approximate otherwise intractable Bayesian calculations. We argue that (a) sampling behavior can be understood in terms of sensible In many situations human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cog- nition can be described as Bayesian inference. However, a number of findings have highlighted an intriguing mismatch between human behavior and that predicted by Bayesian infer- ence: people often appear to make judgments based on a few samples from a probability distribution, rather than the full dis- tribution. Although sample-based approximations are a com- mon implementation of Bayesian inference, the very limited number of samples used by humans seems to be insufficient to approximate the required probability distributions. Here we consider this discrepancy in the broader framework of statis- tical decision theory, and ask: if people were making deci- sions based on samples, but samples were costly, how many samples should people use? We find that under reasonable as- sumptions about how long it takes to produce a sample, locally suboptimal decisions based on few samples are globally op- timal. These results reconcile a large body of work showing sampling, or probability-matching, behavior with the hypoth- esis that human cognition is well described as Bayesian in- ference, and suggest promising future directions for studies of resource-constrained cognition. Keywords: Computational modeling; Bayesian models; Pro- cess models; Sampling Across a wide range of tasks, people seem to act in a man- ner consistent with optimal Bayesian models (in perception: Knill & Richards, 1996; motor action: Maloney, Trommer- shauser, & Landy, 2007; language: Chater & Manning, 2006; decision making: McKenzie, 1994; causal judgments: Grif- fiths & Tenenbaum, 2005; and concept learning: Goodman, Tenenbaum, Feldman, & Griffiths, 2008). However, despite this similarity between Bayesian ideal observers and human observers, two crucial problems remain unaddressed across these domains. First, human behavior often appears to be optimal on average, but not within individual people or in- dividual trials: What are people doing on individual trails to produce optimal behavior in the long-run average? Sec- ond, Bayesian inference is straight-forward when considering small laboratory tasks, but intractable for large-scale prob- lems like those that people face in the real world: How can people be carrying out generally intractable Bayesian calcu- lations in real-world tasks? Here we will argue that both of these problems can be resolved by considering the algorithms that people may be using to approximate Bayesian inference. The first problem is highlighted by an intriguing observa- tion from Goodman et al. (2008) about performance in cat- egorization tasks in which people see positive and negative exemplars of a category and are then asked to generalize any