Deep Weakly-Supervised Domain Adaptation for Pain Localization in Videos
R. Gnana Praveen, Eric Granger, Patrick Cardinal
TL;DR
The paper tackles automatic pain localization from facial expressions under the constraint of sparse annotations and domain shift across real-world capture conditions. It introduces a deep weakly-supervised domain adaptation (WSDA) framework that embeds multiple instance learning into adversarial domain adaptation to train an Inflated 3D-CNN (I3D) on a fully labeled source domain while leveraging weak, sequence-level labels in the target domain. Empirical results on RECOLA as the source and UNBC-McMaster as the target show that WSDA yields superior instance- and sequence-level pain localization compared to baselines and several state-of-the-art MIL methods, especially under limited target supervision. This approach enhances the practicality and scalability of automated pain assessment in settings where frame-level labels are costly or unavailable, enabling more robust pain monitoring in clinical and daily environments.
Abstract
Automatic pain assessment has an important potential diagnostic value for populations that are incapable of articulating their pain experiences. As one of the dominating nonverbal channels for eliciting pain expression events, facial expressions has been widely investigated for estimating the pain intensity of individual. However, using state-of-the-art deep learning (DL) models in real-world pain estimation applications poses several challenges related to the subjective variations of facial expressions, operational capture conditions, and lack of representative training videos with labels. Given the cost of annotating intensity levels for every video frame, we propose a weakly-supervised domain adaptation (WSDA) technique that allows for training 3D CNNs for spatio-temporal pain intensity estimation using weakly labeled videos, where labels are provided on a periodic basis. In particular, WSDA integrates multiple instance learning into an adversarial deep domain adaptation framework to train an Inflated 3D-CNN (I3D) model such that it can accurately estimate pain intensities in the target operational domain. The training process relies on weak target loss, along with domain loss and source loss for domain adaptation of the I3D model. Experimental results obtained using labeled source domain RECOLA videos and weakly-labeled target domain UNBC-McMaster videos indicate that the proposed deep WSDA approach can achieve significantly higher level of sequence (bag)-level and frame (instance)-level pain localization accuracy than related state-of-the-art approaches.
