Hearing Loss Detection from Facial Expressions in One-on-one Conversations

Yufeng Yin; Ishwarya Ananthabhotla; Vamsi Krishna Ithapu; Stavros Petridis; Yu-Hsiang Wu; Christi Miller

Hearing Loss Detection from Facial Expressions in One-on-one Conversations

Yufeng Yin, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Stavros Petridis, Yu-Hsiang Wu, Christi Miller

TL;DR

The paper addresses detecting hearing loss from a subject’s facial expressions in one-on-one conversations. It introduces a self-supervised variation modeling approach to capture within-subject expression changes across noise levels, and employs adversarial representation learning via a gradient reversal layer to mitigate age-related bias. The method leverages a Marlin-based facial encoder, a variation encoder, and an HL classifier, achieving superior F1-scores on a large egocentric dataset (RLR-CHAT) compared to baselines. Findings show that age bias can degrade performance for younger subjects, which is effectively mitigated by the proposed ABM strategy, enabling more accurate real-time hearing loss detection in social interactions. Overall, this work provides a practical framework for nonverbal-behavior-based monitoring to support timely interventions such as communication strategies or hearing-aid adjustments.

Abstract

Individuals with impaired hearing experience difficulty in conversations, especially in noisy environments. This difficulty often manifests as a change in behavior and may be captured via facial expressions, such as the expression of discomfort or fatigue. In this work, we build on this idea and introduce the problem of detecting hearing loss from an individual's facial expressions during a conversation. Building machine learning models that can represent hearing-related facial expression changes is a challenge. In addition, models need to disentangle spurious age-related correlations from hearing-driven expressions. To this end, we propose a self-supervised pre-training strategy tailored for the modeling of expression variations. We also use adversarial representation learning to mitigate the age bias. We evaluate our approach on a large-scale egocentric dataset with real-world conversational scenarios involving subjects with hearing loss and show that our method for hearing loss detection achieves superior performance over baselines.

Hearing Loss Detection from Facial Expressions in One-on-one Conversations

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 6 figures, 2 tables)

This paper contains 15 sections, 1 equation, 6 figures, 2 tables.

Introduction
Related work
Method
Problem Formulation
Model
Facial Feature Encoding
Self-supervised expression variation modeling
Hearing loss detection with age bias mitigation
Experiments
Dataset
Experiment Setup
Implementation Details
Methods
Experiment Results
Conclusions

Figures (6)

Figure 1: We study the novel problem of hearing loss detection from facial expressions in one-on-one conversations. Given a video clip recording the subject's facial expressions in one-on-one conversations, we detect if the subject has hearing loss. Our method learns the within-subject variability of facial expressions via self-supervised pre-training while reducing the age bias in downstream fine-tuning.
Figure 2: (Best viewed in color) Architectural overview of the proposed method for hearing loss detection in one-on-one conversations: (i) We pre-train the encoder with expression variation modeling to capture the feature variations across noise levels. (ii) We fine-tune the model for hearing loss detection with age bias mitigation.
Figure 3: Number of subjects and positive rates in different age ranges.
Figure 4: Visualizations of 3D T-SNE. Each point represents a segment. In the top row, color represents the age while in the bottom row, color represents the identity. With variation modeling and age bias mitigation, segments of different ages are mixed up together while points from the same person are grouped together.
Figure 5: Age estimations with Marin+VM+ABM. The correlation between the ground-truth and predicted age is not significant ($r=0.11$, $p=0.16$ according to the Pearson coefficient). The results indicate successful mitigation of age bias.
...and 1 more figures

Hearing Loss Detection from Facial Expressions in One-on-one Conversations

TL;DR

Abstract

Hearing Loss Detection from Facial Expressions in One-on-one Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)