LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading

Chengwei Lou; Zekai Jin; Wei Tang; Guangfei Geng; Jin Yang; Lu Zhang

LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading

Chengwei Lou, Zekai Jin, Wei Tang, Guangfei Geng, Jin Yang, Lu Zhang

TL;DR

This work introduces an LLM-enhanced multi-agent reinforcement learning framework for real-time P2P energy trading, where LLMs act as per-prosumer experts within a centralized training and decentralized execution scheme. It couples a phase-triggered LLM expert workflow with a CTDE-based MARL algorithm that employs a differential attention-based critic and Wasserstein-distance-based imitation to align agent policies with expert strategies. The approach is validated on a modified IEEE 141-bus distribution network with 20 prosumers, achieving lower operational costs and reduced voltage violations compared to strong baselines; the LLM workflow demonstrates broad model compatibility and can substitute human experts under secure DSO verification. The results highlight the practical potential of integrating LLM-driven expert guidance with multi-agent learning to balance economic performance and grid security in real-time P2P markets, while noting limitations in generalization and the need for ongoing knowledge-base updates.

Abstract

Real-time peer-to-peer (P2P) electricity markets dynamically adapt to fluctuations in renewable energy and variations in demand, maximizing economic benefits through instantaneous price responses while enhancing grid flexibility. However, scaling expert guidance for massive personalized prosumers poses critical challenges, including diverse decision-making demands and lack of customized modeling frameworks. This paper proposed an integrated large language model-multi-agent reinforcement learning (LLM-MARL) framework for real-time P2P energy trading to address challenges such as the limited technical capability of prosumers, the lack of expert experience, and security issues of distribution networks. LLMs are introduced as experts to generate personalized strategy, guiding MARL under the centralized training with decentralized execution (CTDE) paradigm through imitation learning. A differential attention-based critic network is designed to enhance convergence performance. Experimental results demonstrate that LLM generated strategies effectively substitute human experts. The proposed multi-agent imitation learning algorithms achieve significantly lower economic costs and voltage violation rates on test sets compared to baselines algorithms, while maintaining robust stability. This work provides an effective solution for real-time P2P electricity market decision-making by bridging expert knowledge with agent learning.

LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading

TL;DR

Abstract

LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)