Table of Contents
Fetching ...

Situated Understanding of Errors in Older Adults' Interactions with Voice Assistants: A Month-Long, In-Home Study

Amama Mahmood, Junxiang Wang, Chien-Ming Huang

TL;DR

This work equipped 15 older adults’ homes with smart speakers integrated with custom audio recorders to collect “in-the-wild” audio interaction data for detailed error analysis, and proposed design considerations to better align future VAs with older adults’ expectations and lived experiences.

Abstract

Our work addresses the challenges older adults face with commercial Voice Assistants (VAs), notably in conversation breakdowns and error handling. Traditional methods of collecting user experiences-usage logs and post-hoc interviews-do not fully capture the intricacies of older adults' interactions with VAs, particularly regarding their reactions to errors. To bridge this gap, we equipped 15 older adults' homes with smart speakers integrated with custom audio recorders to collect "in-the-wild" audio interaction data for detailed error analysis. Recognizing the conversational limitations of current VAs, our study also explored the capabilities of Large Language Models (LLMs) to handle natural and imperfect text for improving VAs. Midway through our study, we deployed ChatGPT-powered VA to investigate its efficacy for older adults. Our research suggests leveraging vocal and verbal responses combined with LLMs' contextual capabilities for enhanced error prevention and management in VAs, while proposing design considerations to align VA capabilities with older adults' expectations.

Situated Understanding of Errors in Older Adults' Interactions with Voice Assistants: A Month-Long, In-Home Study

TL;DR

This work equipped 15 older adults’ homes with smart speakers integrated with custom audio recorders to collect “in-the-wild” audio interaction data for detailed error analysis, and proposed design considerations to better align future VAs with older adults’ expectations and lived experiences.

Abstract

Our work addresses the challenges older adults face with commercial Voice Assistants (VAs), notably in conversation breakdowns and error handling. Traditional methods of collecting user experiences-usage logs and post-hoc interviews-do not fully capture the intricacies of older adults' interactions with VAs, particularly regarding their reactions to errors. To bridge this gap, we equipped 15 older adults' homes with smart speakers integrated with custom audio recorders to collect "in-the-wild" audio interaction data for detailed error analysis. Recognizing the conversational limitations of current VAs, our study also explored the capabilities of Large Language Models (LLMs) to handle natural and imperfect text for improving VAs. Midway through our study, we deployed ChatGPT-powered VA to investigate its efficacy for older adults. Our research suggests leveraging vocal and verbal responses combined with LLMs' contextual capabilities for enhanced error prevention and management in VAs, while proposing design considerations to align VA capabilities with older adults' expectations.
Paper Structure (40 sections, 6 figures, 13 tables)

This paper contains 40 sections, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Design of the enclosure and base plate for placing the Echo Dot smart speaker and recording device. CAD files are available online at https://bit.ly/3UIgAri for 3D printing.
  • Figure 2: Error categories and their resolution upon immediate participant retry attempt across four weeks.
  • Figure 3: Error resolution across interaction types over time. The top plot shows for each interaction category the percentage of interactions that resulted in errors and the percentage of interactions in which the errors were resolved; for instance, 414 interactions were categorized as "information-seeking" ($100\%$), $\sim44\% (\sim34\% + \sim9\%)$ of those interactions resulted in an error, and only $\sim9\%$ of the original 414 were resolved after the first retry. The remaining plots for the top three individual categories show how their error resolution rates changed over the full four weeks. Note: Not all errors had retries and some had multiple retries, as discussed later.
  • Figure 4: Participants' responses and explicit actions indicate their positive identification of most errors. For a y-axis error manifestation-identification pair, "no-no" indicates that the error did not manifest and was not identified by the participant, "yes-no" indicates that the error manifested but there was no indication of its identification by the participant, and "yes-yes" indicates that the error manifested and was identified by the participant as evidenced by either their immediate response or their action to rectify the error.
  • Figure 5: Recovery strategies used by participants to address various types of errors. Cumulative totals are shown below each strategy on the x-axis in the top plot. The bottom plots illustrate trends in error recovery strategies over the four-week study period.
  • ...and 1 more figures