Concepedia

TLDR

Reinforcement learning typically uses fixed reward functions, which can be too simplistic for dynamic environments where resources fluctuate, whereas living beings modulate reward value contextually to enhance adaptivity. This review examines the underlying processes of context‑dependent reward modulation and proposes a simplified formalization for artificial agents. We test the formalism by monitoring an agent equipped with a motivated actor–critic model, incorporating the value formalization and physiological stability constraints, as it adapts to environments with varying resource distributions. The study finds that reward processing conditioned on motivational state markedly improves behavioral adaptivity and maintains physiological stability.

Abstract

Reinforcement learning (RL) in the context of artificial agents is typically used to produce behavioral responses as a function of the reward obtained by interaction with the environment. When the problem consists of learning the shortest path to a goal, it is common to use reward functions yielding a fixed value after each decision, for example a positive value if the target location has been attained and a negative value at each intermediate step. However, this fixed strategy may be overly simplistic for agents to adapt to dynamic environments, in which resources may vary from time to time. By contrast, there is significant evidence that most living beings internally modulate reward value as a function of their context to expand their range of adaptivity. Inspired by the potential of this operation, we present a review of its underlying processes and we introduce a simplified formalization for artificial agents. The performance of this formalism is tested by monitoring the adaptation of an agent endowed with a model of motivated actor–critic, embedded with our formalization of value and constrained by physiological stability, to environments with different resource distribution. Our main result shows that the manner in which reward is internally processed as a function of the agent’s motivational state, strongly influences adaptivity of the behavioral cycles generated and the agent’s physiological stability.

References

YearCitations

Page 1