On Equivalence Between Decentralized Policy-Profile Mixtures and Behavioral Coordination Policies in Multi-Agent Systems
Nouman Khan, Vijay G. Subramanian
TL;DR
The paper analyzes constrained cooperative multi-agent decision problems (MA-C-POMDPs) under partial observability, comparing different randomization schemes for policy profiles. By formulating and exploiting occupation-measures, it proves equivalences between joint mixtures of decentralized policy-profiles and common-information based coordination policies, and extends these results to independent mixtures and finite-observation coordination. These equivalence results underpin the potential for strong Lagrangian duality, reduction in the number of randomizations, and learning-based approaches for approximate optimization in multi-agent coordination. The findings provide a unified perspective on how decentralized and centralized (coordinator-based) strategies can realize the same long-term trade-offs, with implications for design and analysis of constrained teamwork in complex environments.
Abstract
Constrained decentralized team problem formulations are good models for many cooperative multi-agent systems. Constraints necessitate randomization when solving for optimal solutions -- our past results show that joint randomization amongst the team is necessary for (strong) Lagrangian duality to hold -- , but a better understanding of randomization still remains. For a partially observed multi-agent system with Borel hidden state and finite observations and actions, we prove the equivalence between joint mixtures of decentralized policy-profiles (both pure and behavioral) and common-information based behavioral coordination policies (also mixtures of them). This generalizes past work that shows equivalence between pure decentralized policy-profiles and pure coordination policies. The equivalence can be exploited to develop results on strong duality and number of randomizations.
