Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture
Stefanos Gkikas, Manolis Tsiknakis
TL;DR
The paper tackles automatic pain assessment from facial videos by introducing GAN-generated synthetic thermal videos as an additional modality. It proposes a Vision-MLP architecture combined with a Transformer to fuse RGB and synthetic thermal features for temporal pain inference, supplemented by augmentation, pre-processing, and targeted pre-training. Experiments on BioVid show that synthetic thermal data can match or exceed RGB performance in binary NP vs P4 tasks, and that multimodal fusion with learned weights provides robust gains with extended training. This work demonstrates the viability of synthetic thermal modalities for continuous pain monitoring, offering a path to richer multimodal facial analysis when real thermal data are scarce.
Abstract
Pain assessment is essential in developing optimal pain management protocols to alleviate suffering and prevent functional decline in patients. Consequently, reliable and accurate automatic pain assessment systems are essential for continuous and effective patient monitoring. This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline and evaluates their efficacy. A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings. Experiments conducted on facial videos from the BioVid database demonstrate the effectiveness of synthetic thermal videos and underline the potential advantages of it.
