The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation
Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab
TL;DR
The paper tackles long-horizon alignment for self-consuming generative models trained on their own outputs. It develops a formal two-agent BT-based curation model, treating theOwner and the Public as dynamic social choicers whose interactions over iterations shape the evolving distribution $p_t$. It reveals three convergence regimes—consensus trap under perfect alignment, compromise within shared optima under partial alignment, and owner-led refinement under disjoint alignment—plus a fundamental impossibility showing that diversity, symmetric influence, and initialization-independence cannot all be achieved simultaneously. Empirical studies in synthetic and text-based settings corroborate the theory, highlighting implications for designing transparent and contestable mechanisms to govern recursive alignment.
Abstract
In self-consuming generative models that train on their own outputs, alignment with user preferences becomes a recursive rather than one-time process. We provide the first formal foundation for analyzing the long-term effects of such recursive retraining on alignment. Under a two-stage curation mechanism based on the Bradley-Terry (BT) model, we model alignment as an interaction between two factions: the Model Owner, who filters which outputs should be learned by the model, and the Public User, who determines which outputs are ultimately shared and retained through interactions with the model. Our analysis reveals three structural convergence regimes depending on the degree of preference alignment: consensus collapse, compromise on shared optima, and asymmetric refinement. We prove a fundamental impossibility theorem: no recursive BT-based curation mechanism can simultaneously preserve diversity, ensure symmetric influence, and eliminate dependence on initialization. Framing the process as dynamic social choice, we show that alignment is not a static goal but an evolving equilibrium, shaped both by power asymmetries and path dependence.
