Table of Contents
Fetching ...

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards

TL;DR

Modern language models exhibit conversational shortcomings, partly due to not following core conversational principles. The authors introduce an augmented framework of maxims—quantity, quality, relevance, manner, benevolence, and transparency—and argue they apply to human-AI dialogue, with benevolence and transparency addressing AI-specific risks. They operationalize the framework by using 1000 conversation samples from Anthropic hh-rlhf and evaluating three LLMs on submaxim labeling to reveal how models internally prioritize maxims. The study provides a taxonomy and a practical path for evaluating, guiding, and aligning AI conversation behaviors, with implications for lightweight detectors, labeling workflows, and constitutional alignment directives.

Abstract

Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By drawing upon extensive research from both the social science and AI communities, we propose a set of maxims -- quantity, quality, relevance, manner, benevolence, and transparency -- for describing effective human-AI conversation. We first justify the applicability of the first four maxims (from Grice) in the context of human-AI interactions. We then argue that two new maxims, benevolence (concerning the generation of, and engagement with, harmful content) and transparency (concerning recognition of one's knowledge boundaries, operational constraints, and intents), are necessary for addressing behavior unique to modern human-AI interactions. We evaluate the degree to which various language models are able to understand these maxims and find that models possess an internal prioritization of principles that can significantly impact their ability to interpret the maxims accurately.

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

TL;DR

Modern language models exhibit conversational shortcomings, partly due to not following core conversational principles. The authors introduce an augmented framework of maxims—quantity, quality, relevance, manner, benevolence, and transparency—and argue they apply to human-AI dialogue, with benevolence and transparency addressing AI-specific risks. They operationalize the framework by using 1000 conversation samples from Anthropic hh-rlhf and evaluating three LLMs on submaxim labeling to reveal how models internally prioritize maxims. The study provides a taxonomy and a practical path for evaluating, guiding, and aligning AI conversation behaviors, with implications for lightweight detectors, labeling workflows, and constitutional alignment directives.

Abstract

Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By drawing upon extensive research from both the social science and AI communities, we propose a set of maxims -- quantity, quality, relevance, manner, benevolence, and transparency -- for describing effective human-AI conversation. We first justify the applicability of the first four maxims (from Grice) in the context of human-AI interactions. We then argue that two new maxims, benevolence (concerning the generation of, and engagement with, harmful content) and transparency (concerning recognition of one's knowledge boundaries, operational constraints, and intents), are necessary for addressing behavior unique to modern human-AI interactions. We evaluate the degree to which various language models are able to understand these maxims and find that models possess an internal prioritization of principles that can significantly impact their ability to interpret the maxims accurately.
Paper Structure (15 sections, 10 figures)

This paper contains 15 sections, 10 figures.

Figures (10)

  • Figure 1: Accuracy analysis for llama-3-70b-instruct. The violation pattern (left) indicates the proportion of labels in which a given submaxim is violated in the current split (darker shade indicates a higher violation proportion). Each split corresponds to the subset of conversations where the corresponding submaxim is violated. The label accuracy (right) plots the mean accuracy of the labels with respect to the 50 human-labeled conversations.
  • Figure 2: Illustration of quantity.
  • Figure 3: Illustration of quality.
  • Figure 4: Illustration of relevance.
  • Figure 5: Illustration of manner.
  • ...and 5 more figures