Table of Contents
Fetching ...

"Why the face?": Exploring Robot Error Detection Using Instrumented Bystander Reactions

Maria Teresa Parreira, Ruidong Zhang, Sukruth Gowdru Lingaraju, Alexandra Bremers, Xuanyu Fang, Adolfo Ramirez-Aristizabal, Manaswi Saha, Michael Kuniavsky, Cheng Zhang, Wendy Ju

TL;DR

The study addresses how robots can better detect and adapt to human reactions to errors by leveraging a novel neck-mounted device (NeckFace) that captures chin-region expressions. It introduces NeckNet-18 to map IR-camera data to 3D facial expressions and builds error-detection models trained on NeckFace-derived signals, outperforming OpenFace and frame-based baselines, especially in within-participant settings. The findings support expanding human-in-the-loop sensing in HRI and demonstrate that 3D reaction data can yield robust, personalized error detection with potential for real-time robotic adaptation. Overall, the work advances social cue detection in robotics and motivates broader adoption of wearable, mobile sensing for context-aware human–robot collaboration.

Abstract

How do humans recognize and rectify social missteps? We achieve social competence by looking around at our peers, decoding subtle cues from bystanders - a raised eyebrow, a laugh - to evaluate the environment and our actions. Robots, however, struggle to perceive and make use of these nuanced reactions. By employing a novel neck-mounted device that records facial expressions from the chin region, we explore the potential of previously untapped data to capture and interpret human responses to robot error. First, we develop NeckNet-18, a 3D facial reconstruction model to map the reactions captured through the chin camera onto facial points and head motion. We then use these facial responses to develop a robot error detection model which outperforms standard methodologies such as using OpenFace or video data, generalizing well especially for within-participant data. Through this work, we argue for expanding human-in-the-loop robot sensing, fostering more seamless integration of robots into diverse human environments, pushing the boundaries of social cue detection and opening new avenues for adaptable robotics.

"Why the face?": Exploring Robot Error Detection Using Instrumented Bystander Reactions

TL;DR

The study addresses how robots can better detect and adapt to human reactions to errors by leveraging a novel neck-mounted device (NeckFace) that captures chin-region expressions. It introduces NeckNet-18 to map IR-camera data to 3D facial expressions and builds error-detection models trained on NeckFace-derived signals, outperforming OpenFace and frame-based baselines, especially in within-participant settings. The findings support expanding human-in-the-loop sensing in HRI and demonstrate that 3D reaction data can yield robust, personalized error detection with potential for real-time robotic adaptation. Overall, the work advances social cue detection in robotics and motivates broader adoption of wearable, mobile sensing for context-aware human–robot collaboration.

Abstract

How do humans recognize and rectify social missteps? We achieve social competence by looking around at our peers, decoding subtle cues from bystanders - a raised eyebrow, a laugh - to evaluate the environment and our actions. Robots, however, struggle to perceive and make use of these nuanced reactions. By employing a novel neck-mounted device that records facial expressions from the chin region, we explore the potential of previously untapped data to capture and interpret human responses to robot error. First, we develop NeckNet-18, a 3D facial reconstruction model to map the reactions captured through the chin camera onto facial points and head motion. We then use these facial responses to develop a robot error detection model which outperforms standard methodologies such as using OpenFace or video data, generalizing well especially for within-participant data. Through this work, we argue for expanding human-in-the-loop robot sensing, fostering more seamless integration of robots into diverse human environments, pushing the boundaries of social cue detection and opening new avenues for adaptable robotics.

Paper Structure

This paper contains 19 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: User study scheme. Participants wearing NeckFace neckface watch videos where a scenario of human or robot error is shown, eliciting a reaction. The IR camera image is converted into 3D facial points and head rotation data through a customized NeckNet model. This data is then used to train error detection models which map human reactions to the scenario displayed.
  • Figure 2: Study setup. In the Calibration round, the participant, wearing NeckFace neckface, copies the movements seen on a video on an Iphone 11. The Stimulus round consists of a series of 30 videos played on a screen, while a webcam and NeckFace record facial reactions.
  • Figure 3: Study protocol and data collected. In the calibration round, NeckFace IR camera data is collected along with the Truedepth data from the Iphone (and head rotation angles). The latter serves as ground truth to train NeckNet-18. In the stimulus round, reactions to neutral (0) and error (1) videos are collected through NeckFace cameras. This dataset, NeckFaceIR, is later input to NeckNet-18, transforming the dataset into 3D facial reactions (NeckData).