Table of Contents
Fetching ...

Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

R. Gnana Praveen, Eric Granger, Patrick Cardinal

TL;DR

The paper tackles automatic pain localization from facial expressions under the constraint of sparse annotations and domain shift across real-world capture conditions. It introduces a deep weakly-supervised domain adaptation (WSDA) framework that embeds multiple instance learning into adversarial domain adaptation to train an Inflated 3D-CNN (I3D) on a fully labeled source domain while leveraging weak, sequence-level labels in the target domain. Empirical results on RECOLA as the source and UNBC-McMaster as the target show that WSDA yields superior instance- and sequence-level pain localization compared to baselines and several state-of-the-art MIL methods, especially under limited target supervision. This approach enhances the practicality and scalability of automated pain assessment in settings where frame-level labels are costly or unavailable, enabling more robust pain monitoring in clinical and daily environments.

Abstract

Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has been widely investigated for estimating the pain intensity of individual. However, using state-of-the-art deep learning (DL) models in real-world pain estimation applications poses several challenges related to the subjective variations of facial expressions, operational capture conditions, and lack of representative training videos with labels. Given the cost of annotating intensity levels for every video frame, we propose a weakly-supervised domain adaptation (WSDA) technique that allows for training 3D CNNs for spatio-temporal pain intensity estimation using weakly labeled videos, where labels are provided on a periodic basis. In particular, WSDA integrates multiple instance learning into an adversarial deep domain adaptation framework to train an Inflated 3D-CNN (I3D) model such that it can accurately estimate pain intensities in the target operational domain. The training process relies on weak target loss, along with domain loss and source loss for domain adaptation of the I3D model. Experimental results obtained using labeled source domain RECOLA videos and weakly-labeled target domain UNBC-McMaster videos indicate that the proposed deep WSDA approach can achieve significantly higher level of sequence (bag)-level and frame (instance)-level pain localization accuracy than related state-of-the-art approaches.

Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos

TL;DR

The paper tackles automatic pain localization from facial expressions under the constraint of sparse annotations and domain shift across real-world capture conditions. It introduces a deep weakly-supervised domain adaptation (WSDA) framework that embeds multiple instance learning into adversarial domain adaptation to train an Inflated 3D-CNN (I3D) on a fully labeled source domain while leveraging weak, sequence-level labels in the target domain. Empirical results on RECOLA as the source and UNBC-McMaster as the target show that WSDA yields superior instance- and sequence-level pain localization compared to baselines and several state-of-the-art MIL methods, especially under limited target supervision. This approach enhances the practicality and scalability of automated pain assessment in settings where frame-level labels are costly or unavailable, enabling more robust pain monitoring in clinical and daily environments.

Abstract

Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has been widely investigated for estimating the pain intensity of individual. However, using state-of-the-art deep learning (DL) models in real-world pain estimation applications poses several challenges related to the subjective variations of facial expressions, operational capture conditions, and lack of representative training videos with labels. Given the cost of annotating intensity levels for every video frame, we propose a weakly-supervised domain adaptation (WSDA) technique that allows for training 3D CNNs for spatio-temporal pain intensity estimation using weakly labeled videos, where labels are provided on a periodic basis. In particular, WSDA integrates multiple instance learning into an adversarial deep domain adaptation framework to train an Inflated 3D-CNN (I3D) model such that it can accurately estimate pain intensities in the target operational domain. The training process relies on weak target loss, along with domain loss and source loss for domain adaptation of the I3D model. Experimental results obtained using labeled source domain RECOLA videos and weakly-labeled target domain UNBC-McMaster videos indicate that the proposed deep WSDA approach can achieve significantly higher level of sequence (bag)-level and frame (instance)-level pain localization accuracy than related state-of-the-art approaches.

Paper Structure

This paper contains 13 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overall Architecture of our proposed deep WSDA technique for training 3D CNNs on weakly-labeled target videos. Best viewed in color.
  • Figure 2: Visualization of pain localization on two different subjects. From top to bottom: scenario where ground truth (GT) shows no pain, but our deep WSDA approach correctly localizes pain. Scenario with multiple peaks of expressions
  • Figure 3: PCC accuracy of I3D model trained with deep WSDA levels with decreasing level of weak supervision on target videos.