Table of Contents
Fetching ...

A theory of appropriateness with applications to generative artificial intelligence

Joel Z. Leibo, Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, Raphael Koster, Edgar A. Duéñez-Guzmán, William S. Isaac, Georgios Piliouras, Stanley M. Bileschi, Iyad Rahwan, Simon Osindero

TL;DR

This paper develops a theory of appropriateness as a socially constructed mechanism that guides action across contexts and scales, arguing it offers a more robust governance lens than AI alignment. It models human decision making through predictive pattern completion within a global workspace, connecting memory, perception, and action to normative behavior. By distinguishing explicit and implicit norms, and conventions and sanctions, it explains how norms emerge, stabilize, and change, with implications for safety, policy, and multi-agent AI ecosystems. The authors advocate a decentralized, polycentric approach to AI governance where norm customization, sanctioning, and context sensitivity are harnessed to achieve collective flourishing in a pluralistic society. This framework aims to guide the design and deployment of norm-sensitive AI that can operate safely and adaptively in long-tail, domain-specific contexts while respecting diverse communities.

Abstract

What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which actions are appropriate in which contexts? And what causes these standards to change over time? Since all judgments of AI appropriateness are ultimately made by humans, we need to understand how appropriateness guides human decision making in order to properly evaluate AI decision making and improve it. This paper presents a theory of appropriateness: how it functions in human society, how it may be implemented in the brain, and what it means for responsible deployment of generative AI technology.

A theory of appropriateness with applications to generative artificial intelligence

TL;DR

This paper develops a theory of appropriateness as a socially constructed mechanism that guides action across contexts and scales, arguing it offers a more robust governance lens than AI alignment. It models human decision making through predictive pattern completion within a global workspace, connecting memory, perception, and action to normative behavior. By distinguishing explicit and implicit norms, and conventions and sanctions, it explains how norms emerge, stabilize, and change, with implications for safety, policy, and multi-agent AI ecosystems. The authors advocate a decentralized, polycentric approach to AI governance where norm customization, sanctioning, and context sensitivity are harnessed to achieve collective flourishing in a pluralistic society. This framework aims to guide the design and deployment of norm-sensitive AI that can operate safely and adaptively in long-tail, domain-specific contexts while respecting diverse communities.

Abstract

What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which actions are appropriate in which contexts? And what causes these standards to change over time? Since all judgments of AI appropriateness are ultimately made by humans, we need to understand how appropriateness guides human decision making in order to properly evaluate AI decision making and improve it. This paper presents a theory of appropriateness: how it functions in human society, how it may be implemented in the brain, and what it means for responsible deployment of generative AI technology.

Paper Structure

This paper contains 78 sections, 23 equations, 2 figures.

Figures (2)

  • Figure 1: The global workspace transiently represents a sequence of assemblies. At each point in time, the content of the actor's global workspace is divided into three consecutive subsequences. The first subsequence contains information recalled from memory. It prefixes the second subsequence, which is of variable-length and references recent perception. The perception part of the global workspace prefixes the third subsequence, which contains premotor information, it is where actions the actor intends to produce are stored until they can be read out by motor control circuitry.
  • Figure 2: $z$ denotes the content represented by a set of parallel specialized summary functions which may correspond to neural circuitry located in different parts of the brain from one other or even themselves consist of distributed representations. For instance, some summary functions may be perceptual in nature e.g. a summary function that asks of recent observations "what kind of situation is this?", some summary functions may be more mnemonically oriented e.g. a summary function that asks of one's episodic memory "what kind of person am I?", and some summary functions may be closer to premotor action planning circuitry such as one that asks "what would a person like me do in a situation like this?". This architecture was inspired by the global workspace architecture of baars1988cognitiveshanahan2010embodiment. Here, at time $t$, our $z_t$ is a snapshot of the content in the global neuronal workspace, i.e. $z_t$ is represented by dynamic cell assemblies linking the far-flung modules comprising the workspace perhaps by oscillating coherently with one another dehaene1998neuronalfries2015rhythms.

Theorems & Definitions (11)

  • Definition 1: $\epsilon$-similar meaning $u\sim v$
  • Definition 2: record of action
  • Definition 3: context-free convention sensitive
  • Definition 4: context-aware counter-factual memory editing $R^{a \rightarrow a'}_f(m, o, c)$
  • Definition 5: contextually convention-sensitive
  • Definition 6: Reproduced due to weight of precedent
  • Definition 7: Sanction Sensitivity
  • Definition 8: Contextual Sanction Sensitivity
  • Definition 9: Normative behavior for the choice between two options
  • Conjecture 1: Norm stability
  • ...and 1 more