AI Apology: A Critical Review of Apology in AI Systems
Hadassah Harland, Richard Dazeley, Hashini Senaratne, Peter Vamplew, Francisco Cruz, Bahareh Nakisa
TL;DR
This paper presents the first synthesis and critical analysis of AI apology research from 2020 to 2023, introducing a Framework for AI apology built on five elements (interaction, offence, recipient, offender, outcomes) and 12 components with six moderators. It synthesizes empirical, theoretical, and technical works to show how apologies can realign misaligned human-AI interactions through affective, regulatory, and informative effects, while highlighting a persistent capability gap in autonomous apology (detection, attribution, explanation, adaptation). The review identifies context-dependent outcomes, mixed evidence for many components, and a need for longitudinal, cross-disciplinary study designs that consider embodiment, anthropomorphism, and user characteristics. It concludes with concrete recommendations to advance the field toward robust, human-aligned apologetic AI systems, including improved measurement practices, simulation environments, and integrated technical capabilities for autonomous apology.
Abstract
Apologies are a powerful tool used in human-human interactions to provide affective support, regulate social processes, and exchange information following a trust violation. The emerging field of AI apology investigates the use of apologies by artificially intelligent systems, with recent research suggesting how this tool may provide similar value in human-machine interactions. Until recently, contributions to this area were sparse, and these works have yet to be synthesised into a cohesive body of knowledge. This article provides the first synthesis and critical analysis of the state of AI apology research, focusing on studies published between 2020 and 2023. We derive a framework of attributes to describe five core elements of apology: outcome, interaction, offence, recipient, and offender. With this framework as the basis for our critique, we show how apologies can be used to recover from misalignment in human-AI interactions, and examine trends and inconsistencies within the field. Among the observations, we outline the importance of curating a human-aligned and cross-disciplinary perspective in this research, with consideration for improved system capabilities and long-term outcomes.
