Table of Contents
Fetching ...

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

Joe Edelman, Tan Zhi-Xuan, Ryan Lowe, Oliver Klingefjord, Vincent Wang-Mascianica, Matija Franklin, Ryan Othniel Kearns, Ellie Hain, Atrisha Sarkar, Michiel Bakker, Fazl Barez, David Duvenaud, Jakob Foerster, Iason Gabriel, Joseph Gubbels, Bryce Goodman, Andreas Haupt, Jobst Heitzig, Julian Jara-Ettinger, Atoosa Kasirzadeh, James Ravi Kirkpatrick, Andrew Koh, W. Bradley Knox, Philipp Koralus, Joel Lehman, Sydney Levine, Samuele Marro, Manon Revel, Toby Shorin, Morgan Sutherland, Michael Henry Tessler, Ivan Vendrov, James Wilken-Smith

TL;DR

The paper argues that aligning AI with operator goals is insufficient when institutions with societal value diverge; it proposes Thick Models of Value (TMV) and Full-Stack Alignment (FSA) to represent, justify, and propagate values and norms across AI systems and social structures. It outlines three guiding stances on values, plus emerging TMV research, and details five application areas spanning AI agents and institutions, including moral reasoning, norm learning, and democratic governance. The contribution is a cohesive framework to preserve enduring values and enable normative reasoning across the stack, with potential to reshape AI-enabled economies and regulation. The work emphasizes cross-disciplinary collaboration and practical experimentation to realize value-aligned infrastructures that support humane flourishing in an era of rapid AI capability.

Abstract

Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

TL;DR

The paper argues that aligning AI with operator goals is insufficient when institutions with societal value diverge; it proposes Thick Models of Value (TMV) and Full-Stack Alignment (FSA) to represent, justify, and propagate values and norms across AI systems and social structures. It outlines three guiding stances on values, plus emerging TMV research, and details five application areas spanning AI agents and institutions, including moral reasoning, norm learning, and democratic governance. The contribution is a cohesive framework to preserve enduring values and enable normative reasoning across the stack, with potential to reshape AI-enabled economies and regulation. The work emphasizes cross-disciplinary collaboration and practical experimentation to realize value-aligned infrastructures that support humane flourishing in an era of rapid AI capability.

Abstract

Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.

Paper Structure

This paper contains 29 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: An example of a “stack” of social institutions, and how they see their users' interests, in the context of recommender systems. Preferentist models of value and thick models of value (TMVs) each act as "lenses” (depicted as ovals) through which information about users is observed. Currently, this observation is very lossy; a user's desire for meaningful connection becomes “engagement metrics” to recommender systems, which becomes “daily active users” to companies, and “quarterly revenue” in markets. In this paper, we argue that in order to achieve full-stack alignment (FSA), we need TMVs that preserve value information as we move up the societal stack. (Note that, in practice, institutional stacks are not strict hierarchies, and thus the "preserved value information" does not decrease monotonically as depicted here).
  • Figure 2: Example work in TMV. Moral Graph Elicitation klingefjord2024 represents values as attentional policies, filtering out ideological slogans to reveal underlying criteria that guide decision-making across contexts. (Reproduced with permission from klingefjord2024.)