Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning
Javier Gonzalez-Ruiz, Carlos Rodriguez-Pardo, Iacopo Savelli, Alice Di Bella, Massimo Tavoni
TL;DR
The paper develops a scalable, open-source multi-agent reinforcement learning framework to evaluate long-term electricity market designs under ambitious decarbonization targets. It uses Independent Proximal Policy Optimization (IPPO) to train profit-maximizing Generation Companies (GENCOs) across three mutually exclusive investment channels (Merchant, CfD, Capacity Market) within a stylized Italian system, incorporating representative days and a copper-plate grid. The study demonstrates how different market designs and policy instruments shape investment, emissions, and price outcomes, highlighting the critical role of long-term market design in enabling decarbonization while mitigating price volatility. It also discusses hyperparameter strategies, limitations of independent learning in MARL, and avenues for extending the framework to include more constraints, risk preferences, and regulator-style agents for policy analysis. Overall, the framework provides a flexible tool for policymakers to stress-test hybrid market designs and long-term incentives in the energy transition landscape.
Abstract
Electricity systems are key to transforming today's society into a carbon-free economy. Long-term electricity market mechanisms, including auctions, support schemes, and other policy instruments, are critical in shaping the electricity generation mix. In light of the need for more advanced tools to support policymakers and other stakeholders in designing, testing, and evaluating long-term markets, this work presents a multi-agent reinforcement learning model capable of capturing the key features of decarbonizing energy systems. Profit-maximizing generation companies make investment decisions in the wholesale electricity market, responding to system needs, competitive dynamics, and policy signals. The model employs independent proximal policy optimization, which was selected for suitability to the decentralized and competitive environment. Nevertheless, given the inherent challenges of independent learning in multi-agent settings, an extensive hyperparameter search ensures that decentralized training yields market outcomes consistent with competitive behavior. The model is applied to a stylized version of the Italian electricity system and tested under varying levels of competition, market designs, and policy scenarios. Results highlight the critical role of market design for decarbonizing the electricity sector and avoiding price volatility. The proposed framework allows assessing long-term electricity markets in which multiple policy and market mechanisms interact simultaneously, with market participants responding and adapting to decarbonization pathways.
