Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

Shuvendu Roy; Ali Etemad

Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

Shuvendu Roy, Ali Etemad

TL;DR

This work presents a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL.

Abstract

Deep learning-based methods have been the key driving force behind much of the recent success of facial expression recognition (FER) systems. However, the need for large amounts of labelled data remains a challenge. Semi-supervised learning offers a way to overcome this limitation, allowing models to learn from a small amount of labelled data along with a large unlabelled dataset. While semi-supervised learning has shown promise in FER, most current methods from general computer vision literature have not been explored in the context of FER. In this work, we present a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL. Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data. Our evaluation includes five FER datasets plus one large face dataset for unconstrained learning. Our results demonstrate that FixMatch consistently achieves better performance on in-distribution unlabelled data, while ReMixMatch stands out among all methods for out-of-distribution, unconstrained, and scarce unlabelled data scenarios. Another significant observation is that with an equal number of labelled samples, semi-supervised learning delivers a considerable improvement over supervised learning, regardless of whether the unlabelled data is in-distribution, out-of-distribution, or unconstrained. We also conduct sensitivity analyses on critical hyper-parameters for the two best methods of each setting. To facilitate reproducibility and further development, we make our code publicly available at: github.com/ShuvenduRoy/SSL_FER_OOD.

Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

TL;DR

Abstract

Paper Structure (46 sections, 10 equations, 9 figures, 8 tables)

This paper contains 46 sections, 10 equations, 9 figures, 8 tables.

Introduction
Related Work
ID SSL
OOD and Unconstrained SSL
Semi-supervised FER
Method
Problem Setup for Semi-supervised Learning
Semi-Supervised Methods
Pi-Model
Mean Teacher
UDA
VAT
Pseudo-label
MixMatch
ReMixMatch
...and 31 more sections

Figures (9)

Figure 1: Semi-supervised FER under ID, OOD, and unconstrained unlabelled data scenarios.
Figure 2: Overview of the semi-supervised learning methods explored in this study. Here, $\textit{Aug}_\textit{i}$, S. Aug, W. Aug and MixUp refer to the ith augmentation of the input $x$, a strongly augmented image, a weekly augmented image, and an augmented image with MixUp operation. Consistency Regularization is different across methods, as defined in Eqs. \ref{['eq_pimodel']}, \ref{['eq_meanteacher']}, \ref{['eq_uda']}, and \ref{['eq_vat']} for Pi-Model, Mean Teacher, UDA, and VAT, respectively. EMA refers to the exponential moving average. Adv. Per. refers to adversarial perturbation. $H(p, q)$ is the cross-entropy loss. Dist. Align is the distribution alignment concept introduced in ReMixMatch. Curriculum pseudo-labels are generated by the concept of adaptive threshold in FlexMatch. Mem. Smooth P. labelling is the concept of memory-smoothed pseudo-labels introduced in CoMatch.
Figure 3: Sample images from three main FER datasets: FER13, RAF-DB, and AffectNet.
Figure 4: Examples of hard augmentations.
Figure 5: Sensitivity study of various parameters for two of the best semi-supervised methods on ID unlabelled data.
...and 4 more figures

Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

TL;DR

Abstract

Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)