Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Xingjian Wang; Li Chai

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Xingjian Wang, Li Chai

TL;DR

This work tackles the challenge of emotion-related facial dynamics being obscured by emotion-irrelevant content in in-the-wild DFER. It introduces IFDD, an implicit, wavelet lifting-based framework that disentangles dynamic emotion cues from global context through a two-stage process: Inter-frame Static-Dynamic Splitting (ISSM) and Lifting-based Aggregation-Disentanglement (LADM). An explicit disentanglement loss combines task supervision with a global-context constraint to promote separation of dynamics from context. Across three challenging datasets, IFDD with CNN and ViT backbones achieves state-of-the-art or near-state-of-the-art performance with modest computational overhead, demonstrating robustness to noisy frames and improved per-emotion discrimination. The approach offers a versatile, backbone-agnostic paradigm for dynamic facial expression analysis with potential extensions to other video-worthy tasks.

Abstract

In-the-wild dynamic facial expression recognition (DFER) encounters a significant challenge in recognizing emotion-related expressions, which are often temporally and spatially diluted by emotion-irrelevant expressions and global context. Most prior DFER methods directly utilize coupled spatiotemporal representations that may incorporate weakly relevant features with emotion-irrelevant context bias. Several DFER methods highlight dynamic information for DFER, but following explicit guidance that may be vulnerable to irrelevant motion. In this paper, we propose a novel Implicit Facial Dynamics Disentanglement framework (IFDD). Through expanding wavelet lifting scheme to fully learnable framework, IFDD disentangles emotion-related dynamic information from emotion-irrelevant global context in an implicit manner, i.e., without exploit operations and external guidance. The disentanglement process contains two stages. The first is Inter-frame Static-dynamic Splitting Module (ISSM) for rough disentanglement estimation, which explores inter-frame correlation to generate content-aware splitting indexes on-the-fly. We utilize these indexes to split frame features into two groups, one with greater global similarity, and the other with more unique dynamic features. The second stage is Lifting-based Aggregation-Disentanglement Module (LADM) for further refinement. LADM first aggregates two groups of features from ISSM to obtain fine-grained global context features by an updater, and then disentangles emotion-related facial dynamic features from the global context by a predictor. Extensive experiments on in-the-wild datasets have demonstrated that IFDD outperforms prior supervised DFER methods with higher recognition accuracy and comparable efficiency. Code is available at https://github.com/CyberPegasus/IFDD.

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

TL;DR

Abstract

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)