LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

Fan Zhang; Zhi-Qi Cheng; Jian Zhao; Xiaojiang Peng; Xuelong Li

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li

TL;DR

FER suffers from label scarcity and inter-class subtlety, and existing SSL methods largely improve pseudo-labels while neglecting representation quality. LEAF addresses this by hierarchically decoupling representations and pseudo-labels into expression-agnostic vs. expression-relevant parts across semantic, instance, and category levels, using MoE-inspired experts and top-k gating, plus an ambiguous, distribution-level consistency loss. The approach yields state-of-the-art results on RAFDB, FERPlus, and AffectNet7/8, with extensive ablations demonstrating the value of each level of decoupling and the flexibility to plug LEAF into other SSL frameworks. This work advances practical SSL-FER under label scarcity and suggests a generalizable framework for joint representation and pseudo-label optimization in constrained annotation regimes.

Abstract

Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the coin by proposing a unified framework termed hierarchicaL dEcoupling And Fusing (LEAF) to coordinate expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF decouples representations into expression-agnostic and expression-relevant components, and adaptively fuses them using learnable gating weights. (2) At the category level, LEAF assigns ambiguous pseudo-labels by decoupling predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at \url{https://github.com/zfkarl/LEAF}.

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

TL;DR

Abstract

Paper Structure (21 sections, 11 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 11 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Facial Expression Recognition
Semi-Supervised Learning
Problem Definition
LEAF Framework
Semantic-level Decoupling and Fusing
Instance-level Decoupling and Fusing
Category-level Decoupling and Fusing
Experiments
Datasets and Evaluation Metrics
Implementation Details
Quantitative Comparison
Qualitative Analysis
Ablation Studies and Analysis
...and 6 more sections

Figures (7)

Figure 1: Our LEAF consistently outperforms state-of-the-art semi-supervised FER approaches across different settings.
Figure 2: An overview of LEAF. The weak and strong augmented views of facial expressions are first mapped into the embedding space through a shared encoder. Then we conduct the semantic-level EAF and the instance-level EAF before and after the classifier to reorganize weights for expression-relevant and expression-agnostic representations, respectively. After getting the predictions, we adopt the category-level EAF to generate ambiguous pseudo-labels for consistency regularization.
Figure 3: The detailed structure of linear expert, bottleneck expert, and residual expert.
Figure 4: Performance comparison about overall accuracy and balanced accuracy with respect to different numbers of labels.
Figure 5: The t-SNE visualization with 1600 labeled samples on RAFDB and FERPlus.
...and 2 more figures

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

TL;DR

Abstract

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (7)