Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors
Szeyi Chan, Shihan Fu, Jiachen Li, Bingsheng Yao, Smit Desai, Mirjana Prpa, Dakuo Wang
TL;DR
The paper addresses the lack of a systematic framework for analyzing verbal and nonverbal user behaviors in human-LLM-VA interactions during complex tasks. It introduces a three-dimensional analytical framework grounded in Behavior Characteristics, Interaction Stages (Exploration, Conflict, Integration), and Stage Transitions, and validates it through a focused reanalysis of 3 hours and 39 minutes of video with 12 participants performing a salad-cooking task using Mango Mango. The study highlights specific verbal and nonverbal behaviors across stages and details how users transition between stages, offering design implications such as emotion-aware responses and adaptive VA personas. The work provides a foundation for designing more natural, socially aware LLM-VAs and for developing multimodal assessment methods in human-LLM-VA interactions across diverse task contexts.
Abstract
Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for continuous interaction. We observed that users show both verbal and nonverbal behaviors, though they know that the LLM-VA can not capture those nonverbal signals. Despite the prevalence of nonverbal behavior in human-human communication, there is no established analytical methodology or framework for exploring it in human-VA interactions. After analyzing 3 hours and 39 minutes of video recordings, we developed an analytical framework with three dimensions: 1) behavior characteristics, including both verbal and nonverbal behaviors, 2) interaction stages--exploration, conflict, and integration--that illustrate the progression of user interactions, and 3) stage transition throughout the task. This analytical framework identifies key verbal and nonverbal behaviors that provide a foundation for future research and practical applications in optimizing human and LLM-VA interactions.
