A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination
Zhengchang Hua, Panagiotis Oikonomou, Karim Djemame, Nikos Tziritas, Georgios Theodoropoulos
TL;DR
This paper tackles the challenge of coordinating a large fleet of EVs in a V2G network under privacy constraints. It introduces DT-MADDPG, a hybrid architecture that integrates a collaborative Digital Twin network with a centralized training core, using a simulation-assisted critic that decomposes the value into a short-term model-based component $R_{sim}$ and a long-term residual $Q_{res,i}$. The approach preserves privacy by sharing only high-level predictions via DTs while achieving coordination performance comparable to fully centralised MADDPG, and it exhibits improved data-decentralisation and balanced communication loads. The work has practical significance for deploying learning-based, privacy-preserving coordination in complex cyber-physical systems like smart grids and EV fleets, with potential extensions to privacy-aware economicModels and real-world hardware validation.
Abstract
The coordination of large-scale, decentralised systems, such as a fleet of Electric Vehicles (EVs) in a Vehicle-to-Grid (V2G) network, presents a significant challenge for modern control systems. While collaborative Digital Twins have been proposed as a solution to manage such systems without compromising the privacy of individual agents, deriving globally optimal control policies from the high-level information they share remains an open problem. This paper introduces Digital Twin Assisted Multi-Agent Deep Deterministic Policy Gradient (DT-MADDPG) algorithm, a novel hybrid architecture that integrates a multi-agent reinforcement learning framework with a collaborative DT network. Our core contribution is a simulation-assisted learning algorithm where the centralised critic is enhanced by a predictive global model that is collaboratively built from the privacy-preserving data shared by individual DTs. This approach removes the need for collecting sensitive raw data at a centralised entity, a requirement of traditional multi-agent learning algorithms. Experimental results in a simulated V2G environment demonstrate that DT-MADDPG can achieve coordination performance comparable to the standard MADDPG algorithm while offering significant advantages in terms of data privacy and architectural decentralisation. This work presents a practical and robust framework for deploying intelligent, learning-based coordination in complex, real-world cyber-physical systems.
