Table of Contents
Fetching ...

The Role of Consequential and Functional Sound in Human-Robot Interaction: Toward Audio Augmented Reality Interfaces

Aliyah Smith, Monroe Kennedy

TL;DR

The paper addresses how audio augmentation can enhance human-robot interaction by examining consequential, functional, and spatial sounds within audio AR. Through three experiments with Kinova Gen3, it demonstrates that consequential sounds may not impair perception in quiet robots, while spatial sounds improve warmth, reduce discomfort, and convey trajectory effectively; lateral localization is strongest, frontal localization lags. The findings yield design insights and guidelines for integrating functional and spatial auditory cues to improve communication, task performance, and user experience in HRI. This work advances audio-based interaction strategies and highlights the potential of spatialized sound to support reliable, immersive human-robot collaboration in constrained environments.

Abstract

As robots become increasingly integrated into everyday environments, understanding how they communicate with humans is critical. Sound offers a powerful channel for interaction, encompassing both operational noises and intentionally designed auditory cues. In this study, we examined the effects of consequential and functional sounds on human perception and behavior, including a novel exploration of spatial sound through localization and handover tasks. Results show that consequential sounds of the Kinova Gen3 manipulator did not negatively affect perceptions, spatial localization is highly accurate for lateral cues but declines for frontal cues, and spatial sounds can simultaneously convey task-relevant information while promoting warmth and reducing discomfort. These findings highlight the potential of functional and transformative auditory design to enhance human-robot collaboration and inform future sound-based interaction strategies.

The Role of Consequential and Functional Sound in Human-Robot Interaction: Toward Audio Augmented Reality Interfaces

TL;DR

The paper addresses how audio augmentation can enhance human-robot interaction by examining consequential, functional, and spatial sounds within audio AR. Through three experiments with Kinova Gen3, it demonstrates that consequential sounds may not impair perception in quiet robots, while spatial sounds improve warmth, reduce discomfort, and convey trajectory effectively; lateral localization is strongest, frontal localization lags. The findings yield design insights and guidelines for integrating functional and spatial auditory cues to improve communication, task performance, and user experience in HRI. This work advances audio-based interaction strategies and highlights the potential of spatialized sound to support reliable, immersive human-robot collaboration in constrained environments.

Abstract

As robots become increasingly integrated into everyday environments, understanding how they communicate with humans is critical. Sound offers a powerful channel for interaction, encompassing both operational noises and intentionally designed auditory cues. In this study, we examined the effects of consequential and functional sounds on human perception and behavior, including a novel exploration of spatial sound through localization and handover tasks. Results show that consequential sounds of the Kinova Gen3 manipulator did not negatively affect perceptions, spatial localization is highly accurate for lateral cues but declines for frontal cues, and spatial sounds can simultaneously convey task-relevant information while promoting warmth and reducing discomfort. These findings highlight the potential of functional and transformative auditory design to enhance human-robot collaboration and inform future sound-based interaction strategies.

Paper Structure

This paper contains 28 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: The breakdown of the three-part experimental study.
  • Figure 2: The four experimental conditions for Experiment A are shown. The Kinova Gen3 manipulator was selected for its suitability in collaborative and household tasks (e.g., TidyBot tidybot). Participants were assigned to conditions using a quasi-random procedure to ensure equal group sizes.
  • Figure 3: Box-and-whisker plots illustrating participant responses across the four experimental conditions and four perceptual scales in Experiment A. Higher scores indicate more positive perceptions. Black diamonds represent mean values, black horizontal lines indicate medians, plus signs denote outliers, boxes correspond to the interquartile range (25th–75th percentiles), and whiskers extend to 1.5 times the interquartile range. (N = 48)
  • Figure 4: Normalized confusion matrices aggregated across all participants for three scenes, with two trials per scene. Darker shades indicate higher classification accuracy. (N = 51)
  • Figure 5: Box-and-whisker plots illustrating participant responses across the three experimental conditions and three attribute scales in Experiment C. Plot elements are structured as described in Figure \ref{['fig:experimentAresults']}. Statistical significance from exploratory paired comparisons is indicated by asterisks above brackets (* corresponds to p $<$ 0.05). (N = 41)
  • ...and 1 more figures