The dynamics of reinforcement learning in cooperative multiagent systems

TLDR

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multi‑agent systems. The study examines factors influencing learning dynamics in cooperative multi‑agent systems and proposes optimistic exploration strategies to improve convergence to optimal equilibria. The authors compare unaware versus joint‑action‑aware reinforcement learners, analyze Q‑learning under different game structures and exploration strategies, and introduce optimistic exploration methods to enhance convergence to optimal Nash equilibria.

Abstract

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multi agent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-leaming in cooperative multi agent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.

References

Page 1

	Year	Citations

Page 1