Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis
James Rudd-Jones, Mirco Musolesi, María Pérez-Ortiz
TL;DR
The paper addresses climate policy synthesis under deep uncertainty by proposing a framework that adds Multi-Agent Reinforcement Learning to Integrated Assessment Models. IAMs link socio-economic and environmental processes, making them suitable for policy-trajectory optimization with MARL via formulation as a $MDP$ or $SG$. It identifies core interface challenges—reward design, scalability, uncertainty propagation, validation, distributional robustness, and explainability—and discusses how principled MARL could address them while acknowledging simulator limitations. If realized, this approach would enable robust exploration of policy pathways and provide policymakers with diverse, uncertainty-aware guidance.
Abstract
Climate policy development faces significant challenges due to deep uncertainty, complex system dynamics, and competing stakeholder interests. Climate simulation methods, such as Earth System Models, have become valuable tools for policy exploration. However, their typical use is for evaluating potential polices, rather than directly synthesizing them. The problem can be inverted to optimize for policy pathways, but the traditional optimization approaches often struggle with non-linear dynamics, heterogeneous agents, and comprehensive uncertainty quantification. We propose a framework for augmenting climate simulations with Multi-Agent Reinforcement Learning (MARL) to address these limitations. We identify key challenges at the interface between climate simulations and the application of MARL in the context of policy synthesis, including reward definition, scalability with increasing agents and state spaces, uncertainty propagation across linked systems, and solution validation. Additionally, we discuss challenges in making MARL-derived solutions interpretable and useful for policy-makers. Our framework provides a foundation for more sophisticated climate policy exploration while acknowledging important limitations and areas for future research.
