Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets

Nelson Salazar-Pena; Alejandra Tabares; Andres Gonzalez-Mancera

Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets

Nelson Salazar-Pena, Alejandra Tabares, Andres Gonzalez-Mancera

TL;DR

This work tackles the challenge of coordinating decentralized local energy markets without explicit communication or centralized control. It introduces implicit cooperation, where system-level KPIs embedded in agent observations guide learning in a decentralized partially observable setting, effectively leveraging stigmergy to achieve grid balance. A 3×3 factorial study across CTCE, CTDE, and DTDE training paradigms and PPO, APPO, SAC algorithms demonstrates that APPO-DTDE nearly matches the centralized benchmark (APPO-CTCE) with a small performance gap, while delivering superior grid stability and privacy. The results reveal a trade-off between efficiency and stability, with DTDE offering the most stable physical profile and emergent, self-organized trading communities, suggesting a practical privacy-preserving path for scalable local energy markets. The findings support deploying KPI-driven implicit coordination to reduce reliance on centralized infrastructure, while highlighting algorithm-architecture choices (e.g., APPO in DTDE) that optimize scalability and resilience in real-world deregulated grids.

Abstract

This paper proposes implicit cooperation, a framework enabling decentralized agents to approximate optimal coordination in local energy markets without explicit peer-to-peer communication. We formulate the problem as a decentralized partially observable Markov decision problem that is solved through a multi-agent reinforcement learning task in which agents use stigmergic signals (key performance indicators at the system level) to infer and react to global states. Through a 3x3 factorial design on an IEEE 34-node topology, we evaluated three training paradigms (CTCE, CTDE, DTDE) and three algorithms (PPO, APPO, SAC). Results identify APPO-DTDE as the optimal configuration, achieving a coordination score of 91.7% relative to the theoretical centralized benchmark (CTCE). However, a critical trade-off emerges between efficiency and stability: while the centralized benchmark maximizes allocative efficiency with a peer-to-peer trade ratio of 0.6, the fully decentralized approach (DTDE) demonstrates superior physical stability. Specifically, DTDE reduces the variance of grid balance by 31% compared to hybrid architectures, establishing a highly predictable, import-biased load profile that simplifies grid regulation. Furthermore, topological analysis reveals emergent spatial clustering, where decentralized agents self-organize into stable trading communities to minimize congestion penalties. While SAC excelled in hybrid settings, it failed in decentralized environments due to entropy-driven instability. This research proves that stigmergic signaling provides sufficient context for complex grid coordination, offering a robust, privacy-preserving alternative to expensive centralized communication infrastructure.

Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets

TL;DR

Abstract

Paper Structure (60 sections, 13 equations, 7 figures, 10 tables)

This paper contains 60 sections, 13 equations, 7 figures, 10 tables.

Introduction
Motivation
Problem statement
Research gap
Novelty and contribution
Paper structure
Literature review
Implicit cooperation and emergent coordination
A taxonomy of coordination: the limits of explicit control
Price-only mechanisms for implicit coordination
Theoretical foundations: from explicit negotiation to stigmergy
Emergent coordination and the challenge of unintended behaviors
KPI-based coordination and the role of reputation in energy markets
MARL in energy systems
From independent learning to centralized training
...and 45 more sections

Figures (7)

Figure 1: Generation and demand profiles for DER agents in the case study.
Figure 2: The 34-node IEEE test feeder as the grid network for the case study.
Figure 3: Feed-in tariff and utility price profiles for DSO agent in the case study.
Figure 4: Mean episode reward for evaluating the 9 experiments over 10 000 training episodes. Only the first 300 episodes are shown.
Figure 5: Implicit cooperation KPIs and market dynamics for the best configurations identified: APPO-CTCE, SAC-CTDE, and APPO-DTDE.
...and 2 more figures

Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets

TL;DR

Abstract

Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets

Authors

TL;DR

Abstract

Table of Contents

Figures (7)