QTRAN: Learning to Factorize with Transformation for Cooperative\n Multi-Agent Reinforcement Learning

Abstract

We explore value-based solutions for multi-agent reinforcement learning\n(MARL) tasks in the centralized training with decentralized execution (CTDE)\nregime popularized recently. However, VDN and QMIX are representative examples\nthat use the idea of factorization of the joint action-value function into\nindividual ones for decentralized execution. VDN and QMIX address only a\nfraction of factorizable MARL tasks due to their structural constraint in\nfactorization such as additivity and monotonicity. In this paper, we propose a\nnew factorization method for MARL, QTRAN, which is free from such structural\nconstraints and takes on a new approach to transforming the original joint\naction-value function into an easily factorizable one, with the same optimal\nactions. QTRAN guarantees more general factorization than VDN or QMIX, thus\ncovering a much wider class of MARL tasks than does previous methods. Our\nexperiments for the tasks of multi-domain Gaussian-squeeze and modified\npredator-prey demonstrate QTRAN's superior performance with especially larger\nmargins in games whose payoffs penalize non-cooperative behavior more\naggressively.\n

References

Page 1

	Year	Citations

Page 1