Table of Contents
Fetching ...

Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos

R. Gnana Praveen, Eric Granger, Patrick Cardinal

TL;DR

This work tackles domain shift and annotation sparsity in automatic pain intensity estimation from facial videos by proposing WSDA-OR, a deep learning model that combines weakly supervised domain adaptation with ordinal regression. It introduces Gaussian modeling of ordinal pain levels and an adaptive MIL pooling mechanism to leverage multiple relevant frames within a sequence, integrated into a three-part architecture with shared feature mapping, ordinal label prediction, and a domain discriminator trained via a gradient reversal layer. The approach is validated on RECOLA (source) and UNBC-McMaster (target), with additional tests on BIOVID and Fatigue datasets, showing superior frame-level localization and competitive sequence-level performance against state-of-the-art MIL/ordinal methods. The key contributions are (i) Gaussian encoding of ordinal targets for cross-domain training, (ii) adaptive MIL pooling to utilize multiple informative frames, and (iii) a WSDA framework that jointly minimizes source-label prediction loss and target weak-label loss while aligning domains. Together, these advances enable more accurate, robust pain intensity estimation under real-world, weakly labeled conditions, with practical implications for automated pain assessment and monitoring.

Abstract

Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and operational capture conditions, the accuracy of state-of-the-art DL models for recognizing facial expressions may decline. Domain adaptation has been widely explored to alleviate the problem of domain shifts that typically occur between video data captured across various source and target domains. Moreover, given the laborious task of collecting and annotating videos, and subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning is gaining attention in such applications. State-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relationship among pain intensity levels, nor temporal coherence of multiple consecutive frames. This paper introduces a new DL model for weakly-supervised DA with ordinal regression that can be adapted using target domain videos with coarse labels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among intensity levels assigned to target sequences, and associates multiple relevant frames to sequence-level labels. In particular, it learns discriminant and domain-invariant feature representations by integrating multiple instance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from target domain. The proposed approach was validated using RECOLA video dataset as fully-labeled source domain data, and UNBC-McMaster shoulder pain video dataset as weakly-labeled target domain data. We have also validated WSDA-OR on BIOVID and Fatigue datasets for sequence level estimation.

Deep Domain Adaptation for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labelled Videos

TL;DR

This work tackles domain shift and annotation sparsity in automatic pain intensity estimation from facial videos by proposing WSDA-OR, a deep learning model that combines weakly supervised domain adaptation with ordinal regression. It introduces Gaussian modeling of ordinal pain levels and an adaptive MIL pooling mechanism to leverage multiple relevant frames within a sequence, integrated into a three-part architecture with shared feature mapping, ordinal label prediction, and a domain discriminator trained via a gradient reversal layer. The approach is validated on RECOLA (source) and UNBC-McMaster (target), with additional tests on BIOVID and Fatigue datasets, showing superior frame-level localization and competitive sequence-level performance against state-of-the-art MIL/ordinal methods. The key contributions are (i) Gaussian encoding of ordinal targets for cross-domain training, (ii) adaptive MIL pooling to utilize multiple informative frames, and (iii) a WSDA framework that jointly minimizes source-label prediction loss and target weak-label loss while aligning domains. Together, these advances enable more accurate, robust pain intensity estimation under real-world, weakly labeled conditions, with practical implications for automated pain assessment and monitoring.

Abstract

Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and operational capture conditions, the accuracy of state-of-the-art DL models for recognizing facial expressions may decline. Domain adaptation has been widely explored to alleviate the problem of domain shifts that typically occur between video data captured across various source and target domains. Moreover, given the laborious task of collecting and annotating videos, and subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning is gaining attention in such applications. State-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relationship among pain intensity levels, nor temporal coherence of multiple consecutive frames. This paper introduces a new DL model for weakly-supervised DA with ordinal regression that can be adapted using target domain videos with coarse labels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among intensity levels assigned to target sequences, and associates multiple relevant frames to sequence-level labels. In particular, it learns discriminant and domain-invariant feature representations by integrating multiple instance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from target domain. The proposed approach was validated using RECOLA video dataset as fully-labeled source domain data, and UNBC-McMaster shoulder pain video dataset as weakly-labeled target domain data. We have also validated WSDA-OR on BIOVID and Fatigue datasets for sequence level estimation.

Paper Structure

This paper contains 18 sections, 16 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Examples of video frames with (left) and without (right) shoulder pain from the UNBC-McMaster dataset 5771462.
  • Figure 2: Overall Architecture of the proposed approach (WSDA-OR). Inc denotes Inception module 7298594. Different colors are used to discriminate data flow in different loss components. Best viewed in color
  • Figure 3: Gaussian Representation of weak ordinal labels
  • Figure 4: PCC accuracy of I3D model trained with deep WSDA-OR levels with decreasing level of weak supervision on target videos.
  • Figure 5: Visualization of pain localization on two different subjects in UNBC dataset. From top to bottom: Scenario with multiple peaks of pain expressions, our deep WSDA-OR localizes pain better than ground truth and WSDA Praveen. Scenario where ground truth (GT) shows no pain, but our deep WSDA-OR approach correctly localizes pain better than WSDA Praveen.