Behavioral Indicators of Overreliance During Interaction with Conversational Language Models
Chang Liu, Qinyi Zhou, Xinjie Shen, Xingyu Bruce Liu, Tongshuang Wu, Xiang 'Anthony' Chen
TL;DR
This work tackles the problem of overreliance on conversational LLMs by shifting from traditional outcome-focused evaluation to process-oriented detection using interaction traces. By collecting detailed mouse/keyboard logs from 77 participants across three misinformation-injected tasks, the study semantically encodes and clusters behavior with a transformer-based autoencoder and DBSCAN, identifying five behavioral patterns that correlate with higher or lower overreliance. The authors provide a robust analytical pipeline, including preprocessing, segmentation into windows, latent embedding, clustering, and post-clustering validation, and translate findings into design recommendations for adaptive interfaces to mitigate overreliance in real time. The contributions include a publicly informative dataset of interaction behaviors, a cluster-based framework linking behavior to overreliance, and practical interface guidelines to reduce reliance on AI without hindering productive human-AI collaboration, thereby enhancing the reliability of LLM-assisted tasks in everyday use.
Abstract
LLMs are now embedded in a wide range of everyday scenarios. However, their inherent hallucinations risk hiding misinformation in fluent responses, raising concerns about overreliance on AI. Detecting overreliance is challenging, as it often arises in complex, dynamic contexts and cannot be easily captured by post-hoc task outcomes. In this work, we aim to investigate how users' behavioral patterns correlate with overreliance. We collected interaction logs from 77 participants working with an LLM injected plausible misinformation across three real-world tasks and we assessed overreliance by whether participants detected and corrected these errors. By semantically encoding and clustering segments of user interactions, we identified five behavioral patterns linked to overreliance: users with low overreliance show careful task comprehension and fine-grained navigation; users with high overreliance show frequent copy-paste, skipping initial comprehension, repeated LLM references, coarse locating, and accepting misinformation despite hesitation. We discuss design implications for mitigation.
