Towards Sustainable Investment Policies Informed by Opponent Shaping
Juan Agustin Duque, Razvan Ciuca, Ayoub Echchahed, Hugo Larochelle, Aaron Courville
TL;DR
The paper addresses the misalignment between short‑term profits and long‑term climate welfare by modeling investor–company interactions in a climate‑risk MARL environment called InvestESG. It formalizes when InvestESG exhibits an intertemporal social dilemma and introduces Advantage Alignment, a scalable opponent‑shaping method, to steer learning toward cooperative equilibria. The authors prove threshold conditions for social dilemmas in a simplified setting and empirically demonstrate that Advantage Alignment outperforms standard baselines like IPPO and MAPPO in the full InvestESG, achieving higher social welfare with reduced final mitigation. They also show that Advantage Alignment imbues a cooperative bias via GAE dynamics, helping agents coordinate without central mandates, with implications for policy mechanisms that align market incentives with long‑term sustainability.
Abstract
Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfare, resulting in social dilemmas. InvestESG is a recently proposed multi-agent simulation that captures the dynamic interplay between investors and companies under climate risk. We provide a formal characterization of the conditions under which InvestESG exhibits an intertemporal social dilemma, deriving theoretical thresholds at which individual incentives diverge from collective welfare. Building on this, we apply Advantage Alignment, a scalable opponent shaping algorithm shown to be effective in general-sum games, to influence agent learning in InvestESG. We offer theoretical insights into why Advantage Alignment systematically favors socially beneficial equilibria by biasing learning dynamics toward cooperative outcomes. Our results demonstrate that strategically shaping the learning processes of economic agents can result in better outcomes that could inform policy mechanisms to better align market incentives with long-term sustainability goals.
