Table of Contents
Fetching ...

Trend-Aware Supervision: On Learning Invariance for Semi-Supervised Facial Action Unit Intensity Estimation

Yingjie Chen, Jiarui Zhang, Tao Wang, Yun Liang

TL;DR

This work tackles spurious correlations in semi-supervised facial AU intensity estimation from keyframes by introducing Trend-Aware Supervision (TAS), which leverages trend information to learn invariant AU-specific features. TAS comprises intra-trend ranking, intra-trend speed, and inter-trend subject awareness, applied as four losses alongside the standard regression objective. Across BP4D and DISFA, TAS achieves state-of-the-art ICC and MAE among semi-supervised methods and remains competitive with fully supervised approaches, without increasing inference cost. The approach provides a principled way to disentangle AU-specific appearance changes from co-occurrence and subject biases, with practical value for robust facial behavior analysis under limited annotations.

Abstract

With the increasing need for facial behavior analysis, semi-supervised AU intensity estimation using only keyframe annotations has emerged as a practical and effective solution to relieve the burden of annotation. However, the lack of annotations makes the spurious correlation problem caused by AU co-occurrences and subject variation much more prominent, leading to non-robust intensity estimation that is entangled among AUs and biased among subjects. We observe that trend information inherent in keyframe annotations could act as extra supervision and raising the awareness of AU-specific facial appearance changing trends during training is the key to learning invariant AU-specific features. To this end, we propose \textbf{T}rend-\textbf{A}ware \textbf{S}upervision (TAS), which pursues three kinds of trend awareness, including intra-trend ranking awareness, intra-trend speed awareness, and inter-trend subject awareness. TAS alleviates the spurious correlation problem by raising trend awareness during training to learn AU-specific features that represent the corresponding facial appearance changes, to achieve intensity estimation invariance. Experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the effectiveness of each kind of awareness. And under trend-aware supervision, the performance can be improved without extra computational or storage costs during inference.

Trend-Aware Supervision: On Learning Invariance for Semi-Supervised Facial Action Unit Intensity Estimation

TL;DR

This work tackles spurious correlations in semi-supervised facial AU intensity estimation from keyframes by introducing Trend-Aware Supervision (TAS), which leverages trend information to learn invariant AU-specific features. TAS comprises intra-trend ranking, intra-trend speed, and inter-trend subject awareness, applied as four losses alongside the standard regression objective. Across BP4D and DISFA, TAS achieves state-of-the-art ICC and MAE among semi-supervised methods and remains competitive with fully supervised approaches, without increasing inference cost. The approach provides a principled way to disentangle AU-specific appearance changes from co-occurrence and subject biases, with practical value for robust facial behavior analysis under limited annotations.

Abstract

With the increasing need for facial behavior analysis, semi-supervised AU intensity estimation using only keyframe annotations has emerged as a practical and effective solution to relieve the burden of annotation. However, the lack of annotations makes the spurious correlation problem caused by AU co-occurrences and subject variation much more prominent, leading to non-robust intensity estimation that is entangled among AUs and biased among subjects. We observe that trend information inherent in keyframe annotations could act as extra supervision and raising the awareness of AU-specific facial appearance changing trends during training is the key to learning invariant AU-specific features. To this end, we propose \textbf{T}rend-\textbf{A}ware \textbf{S}upervision (TAS), which pursues three kinds of trend awareness, including intra-trend ranking awareness, intra-trend speed awareness, and inter-trend subject awareness. TAS alleviates the spurious correlation problem by raising trend awareness during training to learn AU-specific features that represent the corresponding facial appearance changes, to achieve intensity estimation invariance. Experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the effectiveness of each kind of awareness. And under trend-aware supervision, the performance can be improved without extra computational or storage costs during inference.

Paper Structure

This paper contains 19 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Motivation. (a) Illustration of keyframes and local trends. (b) Spurious correlation caused by AU co-occurrences. The limited annotations may lead the model to learn only AU features of one of the two AUs instead of both, resulting in non-robust intensity estimation entangled between them. (c) Spurious correlation caused by subject variation. Subject variation magnified by the lack of annotations may lead the model to learn non-causal gender features instead of AU features for estimating the intensity of AU12 (orange denotes the dominant features for estimation).
  • Figure 2: Overview. Our method takes a batch of segments as input and each image (taken the one with red box as an example) is first fed into a backbone network for global feature extraction. Then, $C$ separate spatial attention layers are applied to $F_{\rm{global}}$ for AU feature extraction. After that, each AU feature $f_c$ is fed into an MLP to estimate the intensity value $\Tilde{v}_c$ for the $c^{\rm th}$ AU class. During training, trend-aware supervision is applied to AU features $\{f_c^t\}^{T}_{t=1}$ in each segment. And regression loss function is applied to the estimated intensity results of the two annotated keyframes only, i.e., the first and the last ones in each segment.
  • Figure 3: Case study. In each tuple, from left to right, four line charts show the intensity values estimated by model $A_1$, $A_2$, $A_5$ and $A_6$ (red line) for the given sequences on the top, respectively, and the FACS-quantified intensity labels (green line).
  • Figure 4: t-SNE visualization for AU features of AU12 on BP4D. From left to right, every three t-SNE results are colored according to FACS-quantified intensity labels (light blue to dark blue), subject identities (bright colors), and highlighted subject identities (bright blue and green). The first three are for Model $A_1$, and the last three are for Model $A_4$.
  • Figure 5: Empirical study for $\lambda_{\rm rank}$, $\lambda_{\rm spd}$, and $\lambda_{\rm sub}$ on BP4D. Red and blue stars show ICC and MAE of $A_1$.
  • ...and 1 more figures