User Interaction Patterns and Breakdowns in Conversing with LLM-Powered Voice Assistants
Amama Mahmood, Junxiang Wang, Bingsheng Yao, Dakuo Wang, Chien-Ming Huang
TL;DR
This work investigates how Large Language Models (LLMs) augment voice assistants by prototyping ChatGPT within Alexa and conducting an exploratory study (N=$20$) across three tasks: medical self-diagnosis, creative planning, and opinionated discussion. It identifies diverse interaction patterns and demonstrates that the LLM absorbs the majority of intent-recognition failures ($81\%$, approximately) and proactively recovers from some breakdowns ($\approx 11\%$). The study provides design guidelines for tailoring text-centric LLMs to voice interactions, including hierarchical responses, reduced repetition, and context retention to support resilient, multi-turn conversations. The findings hold practical implications for building more fluid, context-aware, and safer LLM-powered voice assistants across high- and low-stakes scenarios.
Abstract
Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and discussion) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance.
