Publication | Closed Access
Solving transition independent decentralized Markov decision processes
225
Citations
31
References
2004
Year
Unknown Venue
Mathematical ProgrammingEngineeringGame TheoryComputational ComplexityMarkov Decision ProcessesMulti-agent LearningOperations ResearchDistributed Decision MakingCombinatorial OptimizationMechanism DesignMulti-agent PlanningDecentralised SystemFormal TreatmentProbability TheoryComputer ScienceOptimal AlgorithmMulti-agent Mechanism DesignMarkov Decision ProcessMulti-agent SystemsMarkov KernelBusiness
Formal treatment of collaborative multi‑agent systems has lagged behind progress in sequential decision making for individual agents, and computational complexity remains a serious obstacle. The study aims to overcome this complexity barrier by identifying a class of decentralized MDPs with independent transitions and presenting a novel algorithm to solve them. The algorithm exploits a structured global reward function that depends on all agents’ histories of states and actions to optimally solve decentralized MDPs with independent transitions. This is the first algorithm to optimally solve a non‑trivial subclass of decentralized MDPs and establishes a foundation for future exact and approximate methods.
Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents' transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1