Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM
Oskar Bohn Lassen, Serio Angelo Maria Agriesti, Filipe Rodrigues, Francisco Camara Pereira
TL;DR
This work addresses the computational bottleneck of incorporating multi-gas climate dynamics into multi-agent reinforcement learning by embedding a fast surrogate of CICERO-SCM into the environment loop. It introduces an RNN-based surrogate trained on 20,000 multi-gas emission trajectories, achieving RMSE around $3{-}4\times 10^{-4}$ K and ~1000× faster one-step inference, which translates into >100× faster end-to-end MARL training. The authors demonstrate policy convergence and consistency through a replay-based evaluation, enabling scalable exploration across alternative climate-policy regimes with multi-gas dynamics. The approach significantly expands the feasible space for climate-economy MARL experiments while preserving policy fidelity, providing a practical pathway to rigorous, large-scale multi-gas policy analysis.
Abstract
Climate policy studies require models that capture the combined effects of multiple greenhouse gases on global temperature, but these models are computationally expensive and difficult to embed in reinforcement learning. We present a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity, highly efficient climate surrogate directly in the environment loop, enabling regional agents to learn climate policies under multi-gas dynamics. As a proof of concept, we introduce a recurrent neural network architecture pretrained on ($20{,}000$) multi-gas emission pathways to surrogate the climate model CICERO-SCM. The surrogate model attains near-simulator accuracy with global-mean temperature RMSE $\approx 0.0004 \mathrm{K}$ and approximately $1000\times$ faster one-step inference. When substituted for the original simulator in a climate-policy MARL setting, it accelerates end-to-end training by $>\!100\times$. We show that the surrogate and simulator converge to the same optimal policies and propose a methodology to assess this property in cases where using the simulator is intractable. Our work allows to bypass the core computational bottleneck without sacrificing policy fidelity, enabling large-scale multi-agent experiments across alternative climate-policy regimes with multi-gas dynamics and high-fidelity climate response.
