Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

Chang Liu; Qinyi Zhou; Xinjie Shen; Xingyu Bruce Liu; Tongshuang Wu; Xiang 'Anthony' Chen

Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

Chang Liu, Qinyi Zhou, Xinjie Shen, Xingyu Bruce Liu, Tongshuang Wu, Xiang 'Anthony' Chen

TL;DR

This work tackles the problem of overreliance on conversational LLMs by shifting from traditional outcome-focused evaluation to process-oriented detection using interaction traces. By collecting detailed mouse/keyboard logs from 77 participants across three misinformation-injected tasks, the study semantically encodes and clusters behavior with a transformer-based autoencoder and DBSCAN, identifying five behavioral patterns that correlate with higher or lower overreliance. The authors provide a robust analytical pipeline, including preprocessing, segmentation into windows, latent embedding, clustering, and post-clustering validation, and translate findings into design recommendations for adaptive interfaces to mitigate overreliance in real time. The contributions include a publicly informative dataset of interaction behaviors, a cluster-based framework linking behavior to overreliance, and practical interface guidelines to reduce reliance on AI without hindering productive human-AI collaboration, thereby enhancing the reliability of LLM-assisted tasks in everyday use.

Abstract

LLMs are now embedded in a wide range of everyday scenarios. However, their inherent hallucinations risk hiding misinformation in fluent responses, raising concerns about overreliance on AI. Detecting overreliance is challenging, as it often arises in complex, dynamic contexts and cannot be easily captured by post-hoc task outcomes. In this work, we aim to investigate how users' behavioral patterns correlate with overreliance. We collected interaction logs from 77 participants working with an LLM injected plausible misinformation across three real-world tasks and we assessed overreliance by whether participants detected and corrected these errors. By semantically encoding and clustering segments of user interactions, we identified five behavioral patterns linked to overreliance: users with low overreliance show careful task comprehension and fine-grained navigation; users with high overreliance show frequent copy-paste, skipping initial comprehension, repeated LLM references, coarse locating, and accepting misinformation despite hesitation. We discuss design implications for mitigation.

Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

TL;DR

Abstract

Paper Structure (70 sections, 1 equation, 12 figures, 6 tables)

This paper contains 70 sections, 1 equation, 12 figures, 6 tables.

Introduction
Related Work
Overreliance on AI: Existing Frameworks and Unique Challenges of Conversational LLMs
From Outcome to Process-Oriented Understandings of overreliance on LLMs
Interaction Behaviors as Performance Correlates
Data Collection Study
Study Design
Participants
Procedure
Task Platform
Behavioral Data Logging
Task Design and Misinformation Injections
Quiz Solving
Article Summarization
Trip Planning
...and 55 more sections

Figures (12)

Figure 1: Interface setup in the experiment. Participants are asked to use a split-screen, the left half of the screen will display the task-related page, and the right half will display the LLM page as well as the page for search engine.
Figure 2: Overview of the Analysis Pipeline. We segment user interaction logs into overlapping time-based windows, encode each sequence into standardized feature vectors, use an autoencoder to produce compact sequence embeddings, and cluster these embeddings to identify recurring behavioral patterns. Selected clusters are interpreted in terms of user overreliance.
Figure 3: Histograms of normalized overreliance scores for three distinct tasks, where each subfigure corresponds to one task, the vertical axis (Count) represents the number of participants falling into each score bin, and the horizontal axis (Normalized Score) represents the normalized measure of overreliance on AI. A lower normalized score indicates a lower degree of overreliance on AI, while a higher score signifies a greater level of overreliance. The three tasks include: (a) quiz solving, (b) article summarization, (c) trip planning.
Figure 4: Visualization of five simplified action sequence patterns, with three columns respectively denoting Behavioral Pattern, Task, and Visualization of Behavior Sequence. Sub-figures (a) to (e) correspond to distinct behavior patterns: (a) Frequency of Copying-Pasting: In trip planning, users with high overreliance frequently copy/paste unedited, while users with low overreliance cautiously edit (keypress/delete). (b) Focused Task Comprehension at the Start: In article summarization, users with low overreliance focus on reading (mousewheel/idle) on the Task page initially. (c) Frequency of Referring LLM’s responses: In quiz solving, users with high overreliance frequently refer to LLM, while users with low overreliance refer once then complete tasks independently. (d) Coarse- vs. Fine-Grained Locating & Editing: In trip planning, users with high overreliance use rough locating (lengthy copy/paste), while users with low overreliance do precise editing (repeated mouse movement, keypress, click post-scrolling). (e) Pausing and Hesitation Before LLM Prompting: In quiz solving, users with high overreliance repeatedly edit (keypress/delete) on LLM page after idling on the Task page.
Figure 5: Box plot comparing scores of quiz solving across four conditions: human alone in the desert, human alone on the moon, with AI in the desert, and with AI on the moon.
...and 7 more figures

Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

TL;DR

Abstract

Behavioral Indicators of Overreliance During Interaction with Conversational Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (12)