Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value
Joe Edelman, Tan Zhi-Xuan, Ryan Lowe, Oliver Klingefjord, Vincent Wang-Mascianica, Matija Franklin, Ryan Othniel Kearns, Ellie Hain, Atrisha Sarkar, Michiel Bakker, Fazl Barez, David Duvenaud, Jakob Foerster, Iason Gabriel, Joseph Gubbels, Bryce Goodman, Andreas Haupt, Jobst Heitzig, Julian Jara-Ettinger, Atoosa Kasirzadeh, James Ravi Kirkpatrick, Andrew Koh, W. Bradley Knox, Philipp Koralus, Joel Lehman, Sydney Levine, Samuele Marro, Manon Revel, Toby Shorin, Morgan Sutherland, Michael Henry Tessler, Ivan Vendrov, James Wilken-Smith
TL;DR
The paper argues that aligning AI with operator goals is insufficient when institutions with societal value diverge; it proposes Thick Models of Value (TMV) and Full-Stack Alignment (FSA) to represent, justify, and propagate values and norms across AI systems and social structures. It outlines three guiding stances on values, plus emerging TMV research, and details five application areas spanning AI agents and institutions, including moral reasoning, norm learning, and democratic governance. The contribution is a cohesive framework to preserve enduring values and enable normative reasoning across the stack, with potential to reshape AI-enabled economies and regulation. The work emphasizes cross-disciplinary collaboration and practical experimentation to realize value-aligned infrastructures that support humane flourishing in an era of rapid AI capability.
Abstract
Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.
