MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning
Arundhati Banerjee, Soham Phade, Stefano Ermon, Stephan Zheng
TL;DR
MERMAIDE introduces model-based meta-learning to align learners by learning a world model and a meta-learned intervention policy that quickly adapts to unseen agents. By treating each agent as a task and employing MAML with a recurrent world model, the approach achieves fast near-equilibrium alignment in Stackelberg games and cost-efficient intervention policies in bandit settings, even under partial observability and distribution shifts. The framework outperforms model-free baselines and provides insights into when and how to intervene, highlighting the value of model-based priors for non-stationary principal–agent environments. This work offers a flexible, few-shot generalizable method for adaptive incentive design with potential impact on economies, education, and personalized systems where agents learn over time. The results underscore the practical significance of combining world models with gradient-based meta-learning to handle non-stationarity and unseen agent strategies in real-world interventions.
Abstract
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. Moreover, the principal should be few-shot adaptable and minimize the number of interventions, because interventions are often costly. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents with different learning strategies and reward functions. We validate this approach step-by-step. First, in a Stackelberg setting with a best-response agent, we show that meta-learning enables quick convergence to the theoretically known Stackelberg equilibrium at test time, although noisy observations severely increase the sample complexity. We then show that our model-based meta-learning approach is cost-effective in intervening on bandit agents with unseen explore-exploit strategies. Finally, we outperform baselines that use either meta-learning or agent behavior modeling, in both $0$-shot and $K=1$-shot settings with partial agent information.
