Table of Contents
Fetching ...

Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM

Oskar Bohn Lassen, Serio Angelo Maria Agriesti, Filipe Rodrigues, Francisco Camara Pereira

TL;DR

This work addresses the computational bottleneck of incorporating multi-gas climate dynamics into multi-agent reinforcement learning by embedding a fast surrogate of CICERO-SCM into the environment loop. It introduces an RNN-based surrogate trained on 20,000 multi-gas emission trajectories, achieving RMSE around $3{-}4\times 10^{-4}$ K and ~1000× faster one-step inference, which translates into >100× faster end-to-end MARL training. The authors demonstrate policy convergence and consistency through a replay-based evaluation, enabling scalable exploration across alternative climate-policy regimes with multi-gas dynamics. The approach significantly expands the feasible space for climate-economy MARL experiments while preserving policy fidelity, providing a practical pathway to rigorous, large-scale multi-gas policy analysis.

Abstract

Climate policy studies require models that capture the combined effects of multiple greenhouse gases on global temperature, but these models are computationally expensive and difficult to embed in reinforcement learning. We present a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity, highly efficient climate surrogate directly in the environment loop, enabling regional agents to learn climate policies under multi-gas dynamics. As a proof of concept, we introduce a recurrent neural network architecture pretrained on ($20{,}000$) multi-gas emission pathways to surrogate the climate model CICERO-SCM. The surrogate model attains near-simulator accuracy with global-mean temperature RMSE $\approx 0.0004 \mathrm{K}$ and approximately $1000\times$ faster one-step inference. When substituted for the original simulator in a climate-policy MARL setting, it accelerates end-to-end training by $>\!100\times$. We show that the surrogate and simulator converge to the same optimal policies and propose a methodology to assess this property in cases where using the simulator is intractable. Our work allows to bypass the core computational bottleneck without sacrificing policy fidelity, enabling large-scale multi-agent experiments across alternative climate-policy regimes with multi-gas dynamics and high-fidelity climate response.

Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM

TL;DR

This work addresses the computational bottleneck of incorporating multi-gas climate dynamics into multi-agent reinforcement learning by embedding a fast surrogate of CICERO-SCM into the environment loop. It introduces an RNN-based surrogate trained on 20,000 multi-gas emission trajectories, achieving RMSE around K and ~1000× faster one-step inference, which translates into >100× faster end-to-end MARL training. The authors demonstrate policy convergence and consistency through a replay-based evaluation, enabling scalable exploration across alternative climate-policy regimes with multi-gas dynamics. The approach significantly expands the feasible space for climate-economy MARL experiments while preserving policy fidelity, providing a practical pathway to rigorous, large-scale multi-gas policy analysis.

Abstract

Climate policy studies require models that capture the combined effects of multiple greenhouse gases on global temperature, but these models are computationally expensive and difficult to embed in reinforcement learning. We present a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity, highly efficient climate surrogate directly in the environment loop, enabling regional agents to learn climate policies under multi-gas dynamics. As a proof of concept, we introduce a recurrent neural network architecture pretrained on () multi-gas emission pathways to surrogate the climate model CICERO-SCM. The surrogate model attains near-simulator accuracy with global-mean temperature RMSE and approximately faster one-step inference. When substituted for the original simulator in a climate-policy MARL setting, it accelerates end-to-end training by . We show that the surrogate and simulator converge to the same optimal policies and propose a methodology to assess this property in cases where using the simulator is intractable. Our work allows to bypass the core computational bottleneck without sacrificing policy fidelity, enabling large-scale multi-agent experiments across alternative climate-policy regimes with multi-gas dynamics and high-fidelity climate response.

Paper Structure

This paper contains 32 sections, 30 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Proposed framework for integrating climate surrogates into MARL environments. In Module 1, agents choose policies that result in emissions, which in Module 2 are translated into temperature change by a pretrained surrogate and in Module 3 converted into costs.
  • Figure 2: Global mean surface air temperature change for the generated emission trajectories.
  • Figure 3: Architecture of the RNN-based surrogate.
  • Figure 4: Comparison of learned policies in tractable scenario (i) between CICERO-SCM and the GRU-based surrogate.
  • Figure A.1: Ensemble of generated emission trajectories for the five controllable gases. Shaded regions represent the 5–95% range across the 20,000 generated scenarios, solid lines indicate the ensemble median, dashed lines mark the SSP2-4.5 baseline.
  • ...and 12 more figures