Pain in 3D: Generating Controllable Synthetic Faces for Automated Pain Assessment
Xin Lei Lin, Soroush Mehraban, Abhishek Moturu, Babak Taati
TL;DR
The paper tackles the lack of diverse, high-quality pain-expression data by introducing 3DPain, a diffusion-augmented synthetic pipeline that delivers 82,500 pain-annotated frames across 2,500 identities with precise AU and $PSPI$ annotations. It couples this with ViTPain, a cross-modal Vision Transformer that uses neutral-reference cross-attention to isolate dynamic pain signals from identity, enabling improved $PSPI$ regression and AU estimation. Across UNBC-McMaster benchmarks, synthetic pre-training with 3DPain yields significant generalization gains, underscoring the value of large-scale, clinically-grounded synthetic data for pain assessment. The work offers a scalable, interpretable framework for automated pain analysis while noting domain-shift in skin textures and the need for temporal modeling in future work.
Abstract
Automated pain assessment from facial expressions is crucial for non-communicative patients, such as those with dementia. Progress has been limited by two challenges: (i) existing datasets exhibit severe demographic and label imbalance due to ethical constraints, and (ii) current generative models cannot precisely control facial action units (AUs), facial structure, or clinically validated pain levels. We present 3DPain, a large-scale synthetic dataset specifically designed for automated pain assessment, featuring unprecedented annotation richness and demographic diversity. Our three-stage framework generates diverse 3D meshes, textures them with diffusion models, and applies AU-driven face rigging to synthesize multi-view faces with paired neutral and pain images, AU configurations, PSPI scores, and the first dataset-level annotations of pain-region heatmaps. The dataset comprises 82,500 samples across 25,000 pain expression heatmaps and 2,500 synthetic identities balanced by age, gender, and ethnicity. We further introduce ViTPain, a Vision Transformer based cross-modal distillation framework in which a heatmap-trained teacher guides a student trained on RGB images, enhancing accuracy, interpretability, and clinical reliability. Together, 3DPain and ViTPain establish a controllable, diverse, and clinically grounded foundation for generalizable automated pain assessment.
