LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition
Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li
TL;DR
FER suffers from label scarcity and inter-class subtlety, and existing SSL methods largely improve pseudo-labels while neglecting representation quality. LEAF addresses this by hierarchically decoupling representations and pseudo-labels into expression-agnostic vs. expression-relevant parts across semantic, instance, and category levels, using MoE-inspired experts and top-k gating, plus an ambiguous, distribution-level consistency loss. The approach yields state-of-the-art results on RAFDB, FERPlus, and AffectNet7/8, with extensive ablations demonstrating the value of each level of decoupling and the flexibility to plug LEAF into other SSL frameworks. This work advances practical SSL-FER under label scarcity and suggests a generalizable framework for joint representation and pseudo-label optimization in constrained annotation regimes.
Abstract
Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the coin by proposing a unified framework termed hierarchicaL dEcoupling And Fusing (LEAF) to coordinate expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF decouples representations into expression-agnostic and expression-relevant components, and adaptively fuses them using learnable gating weights. (2) At the category level, LEAF assigns ambiguous pseudo-labels by decoupling predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at \url{https://github.com/zfkarl/LEAF}.
