Table of Contents
Fetching ...

Learning to Communicate Functional States with Nonverbal Expressions for Improved Human-Robot Collaboration

Liam Roy, Dana Kulic, Elizabeth Croft

TL;DR

This work tackles conveying a robot's functional state to humans through learnable nonverbal audio. It introduces a reinforcement learning framework that auto-tunes three acoustic parameters across a $P=3$, $D=3$ parameter space ($27$ sounds) using noisy human feedback, implemented as a UCB1-based bandit with per-state Q-tables. Empirical results show significant post-learning improvements in state recognition, faster convergence when initialization is informed by prior users (mean steps drop from $39.08$ to $15.58$), and evidence that final parameter configurations can converge similarly across users under informed initialization, with pitch bend driving perceptual associations. The findings highlight the potential of personalized, audio-based communication in HRI and point to multi-modal, context-aware extensions for robust real-world deployment.

Abstract

Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction. Nonverbal communication is widely used to communicate information during human-robot interaction, however, such methods may also be misunderstood, leading to communication errors. In this work, we explore modulating the acoustic parameter values (pitch bend, beats per minute, beats per loop) of nonverbal auditory expressions to convey functional robot states (accomplished, progressing, stuck). We propose a reinforcement learning (RL) algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions. The proposed approach was evaluated through a user study with 24 participants. The results demonstrate that: 1. Our proposed RL-based approach is able to learn suitable acoustic parameter values which improve the users' ability to correctly identify the state of the robot. 2. Algorithm initialization informed by previous user data can be used to significantly speed up the learning process. 3. The method used for algorithm initialization strongly influences whether participants converge to similar sounds for each robot state. 4. Modulation of pitch bend has the largest influence on user association between sounds and robotic states.

Learning to Communicate Functional States with Nonverbal Expressions for Improved Human-Robot Collaboration

TL;DR

This work tackles conveying a robot's functional state to humans through learnable nonverbal audio. It introduces a reinforcement learning framework that auto-tunes three acoustic parameters across a , parameter space ( sounds) using noisy human feedback, implemented as a UCB1-based bandit with per-state Q-tables. Empirical results show significant post-learning improvements in state recognition, faster convergence when initialization is informed by prior users (mean steps drop from to ), and evidence that final parameter configurations can converge similarly across users under informed initialization, with pitch bend driving perceptual associations. The findings highlight the potential of personalized, audio-based communication in HRI and point to multi-modal, context-aware extensions for robust real-world deployment.

Abstract

Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction. Nonverbal communication is widely used to communicate information during human-robot interaction, however, such methods may also be misunderstood, leading to communication errors. In this work, we explore modulating the acoustic parameter values (pitch bend, beats per minute, beats per loop) of nonverbal auditory expressions to convey functional robot states (accomplished, progressing, stuck). We propose a reinforcement learning (RL) algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions. The proposed approach was evaluated through a user study with 24 participants. The results demonstrate that: 1. Our proposed RL-based approach is able to learn suitable acoustic parameter values which improve the users' ability to correctly identify the state of the robot. 2. Algorithm initialization informed by previous user data can be used to significantly speed up the learning process. 3. The method used for algorithm initialization strongly influences whether participants converge to similar sounds for each robot state. 4. Modulation of pitch bend has the largest influence on user association between sounds and robotic states.
Paper Structure (15 sections, 2 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 2 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: A robot nicknamed Jackal learning to generate nonverbal expressions to be correctly understood by users. The robot state perceived by the human is used to generate feedback to train the learning algorithm. The Spot robot image used within our user study procedure is also shown for reference.
  • Figure 2: Two Q-table initializations for a sound library with $P=3$ parameters, each discretized into $D=3$ regions. The resulting Q-table has 27 actions, each represented as a coloured cube denoting that action's estimated reward. (Left - Subtask U - Uninformed Initialization) Each Q-value is initialized to its maximum value following the principle of optimism under uncertainty. In this uninformed example, all values are initialized to 10. (Right - Subtask I - Informed Initialization) Q-values are set to predefined values derived from prior knowledge. In this informed example, sounds with a negative pitch bend and a greater number of beats per loop are initialized with a positive value.
  • Figure 3: Flowchart depicting the tasks and conditions of the user study.
  • Figure 4: Dot plot depicting the number of states users were able to identify correctly before learning (blue) and after learning (red) with two different robots: Jackal and Spot.
  • Figure 5: Dot plot depicting the number of steps the learning algorithm took to converge for each user from Subtask U (Uninformed Init) and Subtask I (Informed Init) under study conditions UI (Purple) and IU (Orange).
  • ...and 1 more figures