Publication | Closed Access
Active inference and epistemic value
709
Citations
102
References
2015
Year
Artificial IntelligenceEngineeringBehavioral Decision MakingGame TheoryEpistemic LogicSocial SciencesStochastic GameUncertainty QuantificationExpected Free EnergyRobot LearningDecision TheoryMechanism DesignCognitive ScienceActive InferenceNegative Free EnergyFormal TreatmentComputer ScienceSequential Decision MakingExploration V ExploitationDynamic Epistemic LogicAutomated ReasoningEpistemologyStatistical InferenceDecision Science
Negative free energy can be decomposed into extrinsic and epistemic value, aligning with Infomax, active vision, expected utility, and risk‑sensitive control, and showing that softmax parameters represent the expected precision of policy beliefs. The article presents a formal theory of choice behavior based on minimizing expected free energy and demonstrates it with simulations. Minimizing expected free energy is equivalent to maximizing extrinsic utility while maximizing intrinsic information gain, with softmax parameters encoding the expected precision of policy beliefs. The proposed scheme resolves the exploration‑exploitation dilemma by maximizing epistemic value until no further information gain, then exploiting extrinsic value, and the simulations reveal that precision updates resemble dopaminergic discharges in conditioning paradigms.
We offer a formal treatment of choice behavior based on the premise that agents minimize the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (or intrinsic) value. Minimizing expected free energy is therefore equivalent to maximizing extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximizing information gain or intrinsic value (or reducing uncertainty about the causes of valuable outcomes). The resulting scheme resolves the exploration-exploitation dilemma: Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value. This is formally consistent with the Infomax principle, generalizing formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk-sensitive (Kullback-Leibler) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems, ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about, or confidence in, policies. This article focuses on the basic theory, illustrating the ideas with simulations. A key aspect of these simulations is the similarity between precision updates and dopaminergic discharges observed in conditioning paradigms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1