Publication | Closed Access
Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems
426
Citations
52
References
2012
Year
Artificial IntelligenceEngineeringGame TheoryMulti-agent DomainsEducationReinforcement Learning (Educational Psychology)Intelligent SystemsAutonomous SystemsLearning ControlLifelong Reinforcement LearningMulti-agent LearningReinforcement Learning (Computer Engineering)Stochastic GameRobot LearningMechanism DesignMulti-agent PlanningIndependent Reinforcement LearnersComputer ScienceGamesMarkov Decision ProcessCooperative Multi-agent SystemsCooperative Markov GamesDeep Reinforcement LearningCoordination ProblemsMulti-agent Applications
In fully cooperative multi‑agent systems, independent reinforcement learners must overcome coordination difficulties to function effectively. The paper identifies key obstacles to coordination—Pareto selection, non‑stationarity, stochasticity, alter‑exploration, and shadowed equilibria—that hinder independent agents. It classifies several multi‑agent domains (matrix games, Boutilier’s coordination game, predator‑pursuit, and a multi‑state game) and empirically evaluates a suite of independent Q‑learning variants, including decentralized, distributed, hysteretic, recursive‑frequency maximum, and win‑or‑learn fast‑policy hill‑climbing algorithms. The resulting overview of each algorithm’s strengths and weaknesses across the identified challenges provides a guide for selecting suitable methods and informs the design of future algorithms to achieve higher performance in cooperative settings.
Abstract In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1