Table of Contents
Fetching ...

A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination

Zhengchang Hua, Panagiotis Oikonomou, Karim Djemame, Nikos Tziritas, Georgios Theodoropoulos

TL;DR

This paper tackles the challenge of coordinating a large fleet of EVs in a V2G network under privacy constraints. It introduces DT-MADDPG, a hybrid architecture that integrates a collaborative Digital Twin network with a centralized training core, using a simulation-assisted critic that decomposes the value into a short-term model-based component $R_{sim}$ and a long-term residual $Q_{res,i}$. The approach preserves privacy by sharing only high-level predictions via DTs while achieving coordination performance comparable to fully centralised MADDPG, and it exhibits improved data-decentralisation and balanced communication loads. The work has practical significance for deploying learning-based, privacy-preserving coordination in complex cyber-physical systems like smart grids and EV fleets, with potential extensions to privacy-aware economicModels and real-world hardware validation.

Abstract

The coordination of large-scale, decentralised systems, such as a fleet of Electric Vehicles (EVs) in a Vehicle-to-Grid (V2G) network, presents a significant challenge for modern control systems. While collaborative Digital Twins have been proposed as a solution to manage such systems without compromising the privacy of individual agents, deriving globally optimal control policies from the high-level information they share remains an open problem. This paper introduces Digital Twin Assisted Multi-Agent Deep Deterministic Policy Gradient (DT-MADDPG) algorithm, a novel hybrid architecture that integrates a multi-agent reinforcement learning framework with a collaborative DT network. Our core contribution is a simulation-assisted learning algorithm where the centralised critic is enhanced by a predictive global model that is collaboratively built from the privacy-preserving data shared by individual DTs. This approach removes the need for collecting sensitive raw data at a centralised entity, a requirement of traditional multi-agent learning algorithms. Experimental results in a simulated V2G environment demonstrate that DT-MADDPG can achieve coordination performance comparable to the standard MADDPG algorithm while offering significant advantages in terms of data privacy and architectural decentralisation. This work presents a practical and robust framework for deploying intelligent, learning-based coordination in complex, real-world cyber-physical systems.

A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid Coordination

TL;DR

This paper tackles the challenge of coordinating a large fleet of EVs in a V2G network under privacy constraints. It introduces DT-MADDPG, a hybrid architecture that integrates a collaborative Digital Twin network with a centralized training core, using a simulation-assisted critic that decomposes the value into a short-term model-based component and a long-term residual . The approach preserves privacy by sharing only high-level predictions via DTs while achieving coordination performance comparable to fully centralised MADDPG, and it exhibits improved data-decentralisation and balanced communication loads. The work has practical significance for deploying learning-based, privacy-preserving coordination in complex cyber-physical systems like smart grids and EV fleets, with potential extensions to privacy-aware economicModels and real-world hardware validation.

Abstract

The coordination of large-scale, decentralised systems, such as a fleet of Electric Vehicles (EVs) in a Vehicle-to-Grid (V2G) network, presents a significant challenge for modern control systems. While collaborative Digital Twins have been proposed as a solution to manage such systems without compromising the privacy of individual agents, deriving globally optimal control policies from the high-level information they share remains an open problem. This paper introduces Digital Twin Assisted Multi-Agent Deep Deterministic Policy Gradient (DT-MADDPG) algorithm, a novel hybrid architecture that integrates a multi-agent reinforcement learning framework with a collaborative DT network. Our core contribution is a simulation-assisted learning algorithm where the centralised critic is enhanced by a predictive global model that is collaboratively built from the privacy-preserving data shared by individual DTs. This approach removes the need for collecting sensitive raw data at a centralised entity, a requirement of traditional multi-agent learning algorithms. Experimental results in a simulated V2G environment demonstrate that DT-MADDPG can achieve coordination performance comparable to the standard MADDPG algorithm while offering significant advantages in terms of data privacy and architectural decentralisation. This work presents a practical and robust framework for deploying intelligent, learning-based coordination in complex, real-world cyber-physical systems.

Paper Structure

This paper contains 16 sections, 7 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Example V2G Network.
  • Figure 2: Architecture of the proposed framework.
  • Figure 3: Energy drained from non-renewable energy sources, indicating grid stability.
  • Figure 4: Percentage of renewable energy utilisation with different algorithms.
  • Figure 5: User satisfaction rate as a function of the revenue preference weight ($w_{revenue}$).
  • ...and 3 more figures