Communication-Efficient Soft Actor-Critic Policy Collaboration via Regulated Segment Mixture
Xiaoxue Yu, Rongpeng Li, Chengchao Liang, Zhifeng Zhao
TL;DR
The paper tackles the practicality of centralized or heavily centralized MARL training in dynamic environments by proposing a fully distributed, communication-efficient framework that fuses Decentralized Federated Learning with Maximum Entropy reinforcement learning. It introduces RSM-MASAC, which leverages segmented policy aggregation and a theory-guided mix metric to reconstruct referential policies from neighbors and mix parameters without sacrificing policy improvement. A novel mixed-performance bound under MERL and a Fisher Information Matrix-based constraint guide the selective use of neighbor policies, ensuring soft policy improvement during the communication-assisted phase. Extensive traffic-control experiments demonstrate that RSM-MASAC approaches the performance of centralized counterparts while significantly reducing communication overhead and preserving learning stability. This work advances practical distributed MARL for IoV, IoT, and UAV applications by delivering a scalable, theoretically grounded method for policy collaboration under limited communication.
Abstract
Multi-Agent Reinforcement Learning (MARL) has emerged as a foundational approach for addressing diverse, intelligent control tasks in various scenarios like the Internet of Vehicles, Internet of Things, and Unmanned Aerial Vehicles. However, the widely assumed existence of a central node for centralized, federated learning-assisted MARL might be impractical in highly dynamic environments. This can lead to excessive communication overhead, potentially overwhelming the system. To address these challenges, we design a novel communication-efficient, fully distributed algorithm for collaborative MARL under the frameworks of Soft Actor-Critic (SAC) and Decentralized Federated Learning (DFL), named RSM-MASAC. In particular, RSM-MASAC enhances multi-agent collaboration and prioritizes higher communication efficiency in dynamic systems by incorporating the concept of segmented aggregation in DFL and augmenting multiple model replicas from received neighboring policy segments, which are subsequently employed as reconstructed referential policies for mixing. Distinctively diverging from traditional RL approaches, RSM-MASAC introduces new bounds under the framework of Maximum Entropy Reinforcement Learning (MERL). Correspondingly, it adopts a theory-guided mixture metric to regulate the selection of contributive referential policies, thus guaranteeing soft policy improvement during the communication-assisted mixing phase. Finally, the extensive simulations in mixed-autonomy traffic control scenarios verify the effectiveness and superiority of our algorithm.
