Table of Contents
Fetching ...

A Study on Domain Generalization for Failure Detection through Human Reactions in HRI

Maria Teresa Parreira, Sukruth Gowdru Lingaraju, Adolfo Ramirez-Aristizabal, Manaswi Saha, Michael Kuniavsky, Wendy Ju

TL;DR

The need for HRI research focusing on improving model robustness and real-life applicability is emphasized, with a concise analysis of domain generalization in failure detection models trained on human facial expressions.

Abstract

Machine learning models are commonly tested in-distribution (same dataset); performance almost always drops in out-of-distribution settings. For HRI research, the goal is often to develop generalized models. This makes domain generalization - retaining performance in different settings - a critical issue. In this study, we present a concise analysis of domain generalization in failure detection models trained on human facial expressions. Using two distinct datasets of humans reacting to videos where error occurs, one from a controlled lab setting and another collected online, we trained deep learning models on each dataset. When testing these models on the alternate dataset, we observed a significant performance drop. We reflect on the causes for the observed model behavior and leave recommendations. This work emphasizes the need for HRI research focusing on improving model robustness and real-life applicability.

A Study on Domain Generalization for Failure Detection through Human Reactions in HRI

TL;DR

The need for HRI research focusing on improving model robustness and real-life applicability is emphasized, with a concise analysis of domain generalization in failure detection models trained on human facial expressions.

Abstract

Machine learning models are commonly tested in-distribution (same dataset); performance almost always drops in out-of-distribution settings. For HRI research, the goal is often to develop generalized models. This makes domain generalization - retaining performance in different settings - a critical issue. In this study, we present a concise analysis of domain generalization in failure detection models trained on human facial expressions. Using two distinct datasets of humans reacting to videos where error occurs, one from a controlled lab setting and another collected online, we trained deep learning models on each dataset. When testing these models on the alternate dataset, we observed a significant performance drop. We reflect on the causes for the observed model behavior and leave recommendations. This work emphasizes the need for HRI research focusing on improving model robustness and real-life applicability.
Paper Structure (13 sections, 2 figures, 2 tables)

This paper contains 13 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Example of stimulus video, displaying a robot failing a jump. These videos were shown to participants during data collection and included videos of human and robot error, as well as control videos. Video source included in the study repository.
  • Figure 2: Illustration of concepts. In domain generalization, a model for Task X (e.g., failure detection through human reactions) is trained on a dataset A (e.g., from in-lab study) and deployed on an out-of-distribution dataset B (e.g., data collected in the wild). Domain adaptation includes some data from the testing dataset in the training data. Participants' image reproduced with consent.