Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

TLDR

In fully cooperative multi‑agent systems, independent reinforcement learners must overcome coordination difficulties to function effectively. The paper identifies key obstacles to coordination—Pareto selection, non‑stationarity, stochasticity, alter‑exploration, and shadowed equilibria—that hinder independent agents. It classifies several multi‑agent domains (matrix games, Boutilier’s coordination game, predator‑pursuit, and a multi‑state game) and empirically evaluates a suite of independent Q‑learning variants, including decentralized, distributed, hysteretic, recursive‑frequency maximum, and win‑or‑learn fast‑policy hill‑climbing algorithms. The resulting overview of each algorithm’s strengths and weaknesses across the identified challenges provides a guide for selecting suitable methods and informs the design of future algorithms to achieve higher performance in cooperative settings.

Abstract

Abstract In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

References

Page 1

	Year	Citations

Page 1