Scalable Multi-Agent Reinforcement Learning for Residential Load Scheduling under Data Governance
Zhaoming Qin, Nanqing Dong, Di Liu, Zhefan Wang, Junwei Cao
TL;DR
This work addresses privacy and scalability challenges in multi-agent reinforcement learning for cooperative residential load scheduling under data governance. It introduces DADC (decentralized actors with distributed critics), where each household runs an on-device actor and a local critic that outputs a scalar value, which is sent to the cloud to compute a global value function through a lightweight feed-forward network. By decoupling value estimation into scalar local components and a central aggregator, DADC preserves household privacy, reduces cloud communication, and achieves linear scalability in the number of households, while maintaining competitive performance relative to privacy-unconstrained baselines such as DACC. Empirical results on real-world data show that DADC outperforms independent actor-critic (IAC) and approaches DACC in performance, with significant gains in implicit credit assignment and substantial reductions in communication and computation overhead, enabling practical cloud-edge deployment.
Abstract
As a data-driven approach, multi-agent reinforcement learning (MARL) has made remarkable advances in solving cooperative residential load scheduling problems. However, centralized training, the most common paradigm for MARL, limits large-scale deployment in communication-constrained cloud-edge environments. As a remedy, distributed training shows unparalleled advantages in real-world applications but still faces challenge with system scalability, e.g., the high cost of communication overhead during coordinating individual agents, and needs to comply with data governance in terms of privacy. In this work, we propose a novel MARL solution to address these two practical issues. Our proposed approach is based on actor-critic methods, where the global critic is a learned function of individual critics computed solely based on local observations of households. This scheme preserves household privacy completely and significantly reduces communication cost. Simulation experiments demonstrate that the proposed framework achieves comparable performance to the state-of-the-art actor-critic framework without data governance and communication constraints.
