Table of Contents
Fetching ...

Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture

Stefanos Gkikas, Manolis Tsiknakis

TL;DR

The paper tackles automatic pain assessment from facial videos by introducing GAN-generated synthetic thermal videos as an additional modality. It proposes a Vision-MLP architecture combined with a Transformer to fuse RGB and synthetic thermal features for temporal pain inference, supplemented by augmentation, pre-processing, and targeted pre-training. Experiments on BioVid show that synthetic thermal data can match or exceed RGB performance in binary NP vs P4 tasks, and that multimodal fusion with learned weights provides robust gains with extended training. This work demonstrates the viability of synthetic thermal modalities for continuous pain monitoring, offering a path to richer multimodal facial analysis when real thermal data are scarce.

Abstract

Pain assessment is essential in developing optimal pain management protocols to alleviate suffering and prevent functional decline in patients. Consequently, reliable and accurate automatic pain assessment systems are essential for continuous and effective patient monitoring. This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline and evaluates their efficacy. A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings. Experiments conducted on facial videos from the BioVid database demonstrate the effectiveness of synthetic thermal videos and underline the potential advantages of it.

Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture

TL;DR

The paper tackles automatic pain assessment from facial videos by introducing GAN-generated synthetic thermal videos as an additional modality. It proposes a Vision-MLP architecture combined with a Transformer to fuse RGB and synthetic thermal features for temporal pain inference, supplemented by augmentation, pre-processing, and targeted pre-training. Experiments on BioVid show that synthetic thermal data can match or exceed RGB performance in binary NP vs P4 tasks, and that multimodal fusion with learned weights provides robust gains with extended training. This work demonstrates the viability of synthetic thermal modalities for continuous pain monitoring, offering a path to richer multimodal facial analysis when real thermal data are scarce.

Abstract

Pain assessment is essential in developing optimal pain management protocols to alleviate suffering and prevent functional decline in patients. Consequently, reliable and accurate automatic pain assessment systems are essential for continuous and effective patient monitoring. This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline and evaluates their efficacy. A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings. Experiments conducted on facial videos from the BioVid database demonstrate the effectiveness of synthetic thermal videos and underline the potential advantages of it.
Paper Structure (18 sections, 13 equations, 4 figures, 11 tables)

This paper contains 18 sections, 13 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Overview of the pipeline for generating synthetic thermal images, including the Generator $G$ (Encoder, intermediate ResNet, Decoder), and the Discriminator $D$.
  • Figure 2: Schematic overview of the proposed framework for automatic pain assessment, detailing its modules and their primary components: (a) The Vision-MLP module, responsible for extracting embeddings from individual frames. (b) The Token-Mixer, another major sub-module of Vision-MLP, creates the wave representation for the tokens. (c) The Channel-Mixer, a key sub-module within Vision-MLP. (c) The MLP, an integral part of the Channel-Mixer. (e) The fusion process combining RGB and synthetic thermal embeddings, followed by the Transformer module, which performs the final pain assessment.
  • Figure 3: Progressive blurring of RGB and synthetic thermal facial imagery: a sequence illustrating varying degrees of Gaussian blur applied, with kernel sizes incrementally adjusted from $k = 0$ (clear) to $k = 191$ (heavily blurred).
  • Figure 4: 3D embedding space distributions of NP (no pain) and P4 (very severe pain) classes in RGB and synthetic thermal videos, for $k = 0$ (clear) and $k = 191$ (heavily blurred).