Table of Contents
Fetching ...

Ig3D: Integrating 3D Face Representations in Facial Expression Inference

Lu Dong, Xiao Wang, Srirangaraj Setlur, Venu Govindaraju, Ifeoma Nwogu

TL;DR

This work investigates integrating FLAME-based 3D face representations into facial expression inference (FEI) to boost discrete expression classification and valence-arousal (VA) estimation. It compares EMOCA and SMIRK 3D regressors and introduces intermediate and late fusion architectures to combine 3D parameters with 2D FEI models, evaluated on AffectNet and RAF-DB. The results show EMOCA generally outperforms SMIRK for classification, and late fusion approaches achieve state-of-the-art performance on AffectNet VA and RAF-DB classification, highlighting the value of 3D representations in affective reasoning. Overall, the study demonstrates that 3D face geometry can complement 2D FEI, yielding robust gains and a flexible framework for future emotion inference tasks.

Abstract

Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks.

Ig3D: Integrating 3D Face Representations in Facial Expression Inference

TL;DR

This work investigates integrating FLAME-based 3D face representations into facial expression inference (FEI) to boost discrete expression classification and valence-arousal (VA) estimation. It compares EMOCA and SMIRK 3D regressors and introduces intermediate and late fusion architectures to combine 3D parameters with 2D FEI models, evaluated on AffectNet and RAF-DB. The results show EMOCA generally outperforms SMIRK for classification, and late fusion approaches achieve state-of-the-art performance on AffectNet VA and RAF-DB classification, highlighting the value of 3D representations in affective reasoning. Overall, the study demonstrates that 3D face geometry can complement 2D FEI, yielding robust gains and a flexible framework for future emotion inference tasks.

Abstract

Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks.
Paper Structure (16 sections, 3 equations, 3 figures, 8 tables)

This paper contains 16 sections, 3 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: A standard pipeline for 3D facial geometry reconstruction from an image. Left: The regression model extracts disentangled 3D parameter representations from the images. Right: These parameters are utilized to reconstruct the 3D facial geometry using a 3D Morphable Model.
  • Figure 2: 3D Representation Visualization
  • Figure 3: Overview of the 3D Representation Fusion Architecture