ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling
Chi-Hui Lin, Joewie J. Koh, Alessandro Roncone, Lijun Chen
TL;DR
This work tackles scalable decentralized coordination in multi-agent reinforcement learning by addressing two core challenges: identifying objective functions that maximize collective utility and aligning agents' goals without centralized policy sharing. It introduces ROMA-iQSS, a framework that combines independent state-based value learning (iQSS) with a round-robin multi-agent scheduling (ROMA) protocol to enable agents to autonomously discover optimal states and implicitly align their objectives. The authors prove convergence properties for iQSS under their set-equivalence assumption and show that ROMA fosters objective alignment, yielding a joint optimal policy. Empirical results across multi-agent coordination tasks demonstrate that ROMA-iQSS outperforms state-of-the-art centralized and decentralized baselines (e.g., I2Q, indQ) in identifying optimal states and achieving high, stable rewards. This approach offers a scalable path toward robust decentralized coordination in dynamic environments such as warehouses and autonomous driving, with potential extensions to human-robot interactions.
Abstract
Effective multi-agent collaboration is imperative for solving complex, distributed problems. In this context, two key challenges must be addressed: first, autonomously identifying optimal objectives for collective outcomes; second, aligning these objectives among agents. Traditional frameworks, often reliant on centralized learning, struggle with scalability and efficiency in large multi-agent systems. To overcome these issues, we introduce a decentralized state-based value learning algorithm that enables agents to independently discover optimal states. Furthermore, we introduce a novel mechanism for multi-agent interaction, wherein less proficient agents follow and adopt policies from more experienced ones, thereby indirectly guiding their learning process. Our theoretical analysis shows that our approach leads decentralized agents to an optimal collective policy. Empirical experiments further demonstrate that our method outperforms existing decentralized state-based and action-based value learning strategies by effectively identifying and aligning optimal objectives.
