Towards interactive evaluations for interaction harms in human-AI systems

Lujain Ibrahim; Saffron Huang; Umang Bhatt; Lama Ahmad; Markus Anderljung

Towards interactive evaluations for interaction harms in human-AI systems

Lujain Ibrahim, Saffron Huang, Umang Bhatt, Lama Ahmad, Markus Anderljung

TL;DR

Static, model-centered evaluations fail to capture harms that emerge during sustained human–AI interaction. The paper introduces interactional ethics and the concept of interaction harms to guide interactive evaluations, outlining three organizing principles: ecologically valid scenario design, rigorous human-impact metrics, and diverse participation strategies. It discusses implementation challenges, including ethics, data access, infrastructure, and the translation of findings into stakeholder decisions, aiming to bridge research with governance and practice. This framework seeks to enable more accurate assessment of complex human–AI dynamics, ultimately informing safer deployment and governance of interactive AI systems.

Abstract

Current AI evaluation methods, which rely on static, model-only tests, fail to account for harms that emerge through sustained human-AI interaction. As AI systems proliferate and are increasingly integrated into real-world applications, this disconnect between evaluation approaches and actual usage becomes more significant. In this paper, we propose a shift towards evaluation based on \textit{interactional ethics}, which focuses on \textit{interaction harms} - issues like inappropriate parasocial relationships, social manipulation, and cognitive overreliance that develop over time through repeated interaction, rather than through isolated outputs. First, we discuss the limitations of current evaluation methods, which (1) are static, (2) assume a universal user experience, and (3) have limited construct validity. Drawing on research from human-computer interaction, natural language processing, and the social sciences, we present practical principles for designing interactive evaluations. These include ecologically valid interaction scenarios, human impact metrics, and diverse human participation approaches. Finally, we explore implementation challenges and open research questions for researchers, practitioners, and regulators aiming to integrate interactive evaluations into AI governance frameworks. This work lays the groundwork for developing more effective evaluation methods that better capture the complex dynamics between humans and AI systems.

Towards interactive evaluations for interaction harms in human-AI systems

TL;DR

Abstract

Paper Structure (16 sections, 2 figures, 2 tables)

This paper contains 16 sections, 2 figures, 2 tables.

Introduction
An overview of the generative AI evaluation landscape
The current state of AI safety evaluations
Critiques of current evaluations
Methods for studying human-computer interaction
Why current evaluations approaches are insufficient for assessing interaction harms
Towards better evaluations of interaction harms
Principle 1: design interaction scenarios based on user objectives and interaction modes
Principle 2: identify the causal link between model behavior and human impact
Principle 3: structure human participation to balance validity and practicality
Open challenges and ways forward for interactive evaluations
How can we ethically work with human participants on studying harms? When are user simulations appropriate replacements?
How can we improve researcher access to data for understanding interaction harms?
What infrastructure do we need to facilitate interactive evaluations?
How can interactive evaluations produce actionable findings that guide stakeholder decisions? What are the limitations of controlled studies in capturing broader impacts?
...and 1 more sections

Figures (2)

Figure 1: Taxonomy of human-AI interaction modes
Figure 2: Example of a causal trace showing how model properties may influence human behavior and human-AI interaction outcomes. Such traces help identify key measurement points for evaluation.

Towards interactive evaluations for interaction harms in human-AI systems

TL;DR

Abstract

Towards interactive evaluations for interaction harms in human-AI systems

Authors

TL;DR

Abstract

Table of Contents

Figures (2)