Table of Contents
Fetching ...

In conversation with Artificial Intelligence: aligning language models with human values

Atoosa Kasirzadeh, Iason Gabriel

TL;DR

The paper argues for aligning language models with human values through a principle-based framework grounded in speech-act theory and pragmatics, complementing harm-reduction approaches. It develops a taxonomy of utterance types and Gricean maxims to guide domain-sensitive dialogue and proposes discursive ideals for scientific, democratic, and creative discourse. These ideals are used to specify what counts as successful communication with agents across contexts and to derive practical design implications for context-sensitive evaluation and fine-tuning. The authors also acknowledge linguistic and cultural limitations and call for interdisciplinary research to extend the framework beyond English and to other modes of communication.

Abstract

Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.

In conversation with Artificial Intelligence: aligning language models with human values

TL;DR

The paper argues for aligning language models with human values through a principle-based framework grounded in speech-act theory and pragmatics, complementing harm-reduction approaches. It develops a taxonomy of utterance types and Gricean maxims to guide domain-sensitive dialogue and proposes discursive ideals for scientific, democratic, and creative discourse. These ideals are used to specify what counts as successful communication with agents across contexts and to derive practical design implications for context-sensitive evaluation and fine-tuning. The authors also acknowledge linguistic and cultural limitations and call for interdisciplinary research to extend the framework beyond English and to other modes of communication.

Abstract

Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.
Paper Structure (11 sections)