Table of Contents
Fetching ...

InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma

Xiaoxuan Hou, Jiayi Yuan, Joel Z. Leibo, Natasha Jaques

TL;DR

InvestESG presents a first-principles MARL benchmark to study how ESG disclosure mandates affect corporate climate investments in an intertemporal social dilemma. The environment models two agent classes—companies and ESG-conscious investors—where firms allocate capital to mitigation, greenwashing, and resilience, and investors allocate portfolios based on financial and ESG preferences, quantified by utilities that include ESG weighting. Using Independent PPO and a PyTorch/JAX implementation, the study shows that ESG-conscious investors with sufficient capital can realign corporate incentives toward mitigation, while disclosure alone is insufficient; greenwashing can undermine cooperation, and richer climate-risk information generally promotes mitigation. The work demonstrates MARL's potential as a policy-testing tool for climate economics and provides an open, scalable benchmark to study complex, real-world socio-economic dynamics of climate action and finance.

Abstract

InvestESG is a novel multi-agent reinforcement learning (MARL) benchmark designed to study the impact of Environmental, Social, and Governance (ESG) disclosure mandates on corporate climate investments. The benchmark models an intertemporal social dilemma where companies balance short-term profit losses from climate mitigation efforts and long-term benefits from reducing climate risk, while ESG-conscious investors attempt to influence corporate behavior through their investment decisions. Companies allocate capital across mitigation, greenwashing, and resilience, with varying strategies influencing climate outcomes and investor preferences. We are releasing open-source versions of InvestESG in both PyTorch and JAX, which enable scalable and hardware-accelerated simulations for investigating competing incentives in mitigate climate change. Our experiments show that without ESG-conscious investors with sufficient capital, corporate mitigation efforts remain limited under the disclosure mandate. However, when a critical mass of investors prioritizes ESG, corporate cooperation increases, which in turn reduces climate risks and enhances long-term financial stability. Additionally, providing more information about global climate risks encourages companies to invest more in mitigation, even without investor involvement. Our findings align with empirical research using real-world data, highlighting MARL's potential to inform policy by providing insights into large-scale socio-economic challenges through efficient testing of alternative policy and market designs.

InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma

TL;DR

InvestESG presents a first-principles MARL benchmark to study how ESG disclosure mandates affect corporate climate investments in an intertemporal social dilemma. The environment models two agent classes—companies and ESG-conscious investors—where firms allocate capital to mitigation, greenwashing, and resilience, and investors allocate portfolios based on financial and ESG preferences, quantified by utilities that include ESG weighting. Using Independent PPO and a PyTorch/JAX implementation, the study shows that ESG-conscious investors with sufficient capital can realign corporate incentives toward mitigation, while disclosure alone is insufficient; greenwashing can undermine cooperation, and richer climate-risk information generally promotes mitigation. The work demonstrates MARL's potential as a policy-testing tool for climate economics and provides an open, scalable benchmark to study complex, real-world socio-economic dynamics of climate action and finance.

Abstract

InvestESG is a novel multi-agent reinforcement learning (MARL) benchmark designed to study the impact of Environmental, Social, and Governance (ESG) disclosure mandates on corporate climate investments. The benchmark models an intertemporal social dilemma where companies balance short-term profit losses from climate mitigation efforts and long-term benefits from reducing climate risk, while ESG-conscious investors attempt to influence corporate behavior through their investment decisions. Companies allocate capital across mitigation, greenwashing, and resilience, with varying strategies influencing climate outcomes and investor preferences. We are releasing open-source versions of InvestESG in both PyTorch and JAX, which enable scalable and hardware-accelerated simulations for investigating competing incentives in mitigate climate change. Our experiments show that without ESG-conscious investors with sufficient capital, corporate mitigation efforts remain limited under the disclosure mandate. However, when a critical mass of investors prioritizes ESG, corporate cooperation increases, which in turn reduces climate risks and enhances long-term financial stability. Additionally, providing more information about global climate risks encourages companies to invest more in mitigation, even without investor involvement. Our findings align with empirical research using real-world data, highlighting MARL's potential to inform policy by providing insights into large-scale socio-economic challenges through efficient testing of alternative policy and market designs.

Paper Structure

This paper contains 20 sections, 11 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: The InvestESG Environment. Corporations choose how much to invest in mitigating emissions, which affects their ESG Score. Climate-conscious investors can see ESG Scores when deciding how much to invest in each company. However, companies can engage in greenwashing to inexpensively and falsely improve ESG scores without actually mitigating climate change. InvestESG is a social dilemma, where selfish, profit-motivated corporations will not invest in mitigation without further incentives, leading to increased climate risks and decreased global wealth.
  • Figure 2: Status quo scenario where all agents are only profit-motivated. In (a), mitigation spending (blue curve) is minimal, leading climate risk (yellow curve) to increase over time. Adverse weather event occurrences are shown as dotted lines; red lines indicate multiple adverse events in a single year. (b) plots the average number severe climate events over the episode in (a), showing how increasing climate risk leads to more frequent extreme weather events.
  • Figure 3: Schelling diagrams demonstrating that the environment constitutes a social dilemma. The graphs compare payoffs between cooperation (mitigation, blue lines) and defection (no mitigation, red lines) for a focal company, given varying number of other cooperating companies. Yellow lines represent the average payoff across all companies when the focal company defects. Subfigure (a) illustrates the selfish scenario, where all three investors consistently prioritize financial returns ($\alpha^{\mathcal{I}_j} = 0, \text{ for } j=1,2,3$). Here, defection always yields higher payoffs for the focal company than cooperation, leading all companies to defect. However, widespread defection results in lower overall profits, as the average payoff (yellow) increases with greater cooperation, demonstrating the environment constitutes a social dilemma. Subfigure (b) and (c) correspond to two and three infinitely ESG-conscious investors ($\alpha^{\mathcal{I}_j} \approx \infty$), respectively. In (b), cooperation yields higher payoffs than defection for the focal company when few others cooperate. In (c), cooperation outperforms defection in all cases. Therefore, (b-c) demonstrate how investor behavior can transform the environment, eliminating the social dilemma by aligning corporate incentives with mitigation. Subfigures (d) and (e) build on (c) with three ESG-conscious investors. Subfigure (d) introduces resilience spending, while (e) adds greenwashing. The latter reintroduces a social dilemma, where corporations again avoid mitigation.
  • Figure 4: Ending values for all metrics averaged over the last 100 episodes; error bars show std. err. over 3 random seeds. We compare the status quo scenario with solely profit-driven investors (investors with ESG consciousness level of 0), both with and without the ESG disclosure mandate, to scenarios involving three ESG-conscious investors with ESG consciousness level of $\alpha=0.5$, $\alpha=1$, and $\alpha=10$. These results indicate that merely disclosing ESG scores is insufficient to resolve the social dilemma if investors are not interested in investing in climate-friendly companies. However as investors' level of ESG consciousness increases, the ESG mandate results in consistent improvements in mitigation, climate risk, and market wealth.
  • Figure 5: Investigating the effects of the level of ESG consciousness in the case of 5 companies and 3 investors, where investor 0 is profit driven ($\alpha^{\mathcal{I}_0}=0$), and investors 1 and 2 are deeply climate-conscious ($\alpha^{\mathcal{I}_1}=\alpha^{\mathcal{I}_2}=10$). In (a) and (b), Company 0 (purple) learns to be the leading mitigator. The figures plot the investment distribution for each investor, showing that more climate-conscious investors focus on investing in the more climate-conscious companies, mirroring the market bifurcation results of the Schelling diagrams.
  • ...and 9 more figures