Table of Contents
Fetching ...

Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

Shuvendu Roy, Ali Etemad

TL;DR

This work presents a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL.

Abstract

Deep learning-based methods have been the key driving force behind much of the recent success of facial expression recognition (FER) systems. However, the need for large amounts of labelled data remains a challenge. Semi-supervised learning offers a way to overcome this limitation, allowing models to learn from a small amount of labelled data along with a large unlabelled dataset. While semi-supervised learning has shown promise in FER, most current methods from general computer vision literature have not been explored in the context of FER. In this work, we present a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL. Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data. Our evaluation includes five FER datasets plus one large face dataset for unconstrained learning. Our results demonstrate that FixMatch consistently achieves better performance on in-distribution unlabelled data, while ReMixMatch stands out among all methods for out-of-distribution, unconstrained, and scarce unlabelled data scenarios. Another significant observation is that with an equal number of labelled samples, semi-supervised learning delivers a considerable improvement over supervised learning, regardless of whether the unlabelled data is in-distribution, out-of-distribution, or unconstrained. We also conduct sensitivity analyses on critical hyper-parameters for the two best methods of each setting. To facilitate reproducibility and further development, we make our code publicly available at: github.com/ShuvenduRoy/SSL_FER_OOD.

Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data

TL;DR

This work presents a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL.

Abstract

Deep learning-based methods have been the key driving force behind much of the recent success of facial expression recognition (FER) systems. However, the need for large amounts of labelled data remains a challenge. Semi-supervised learning offers a way to overcome this limitation, allowing models to learn from a small amount of labelled data along with a large unlabelled dataset. While semi-supervised learning has shown promise in FER, most current methods from general computer vision literature have not been explored in the context of FER. In this work, we present a comprehensive study on 11 of the most recent semi-supervised methods, in the context of FER, namely Pi-model, Pseudo-label, Mean Teacher, VAT, UDA, MixMatch, ReMixMatch, FlexMatch, CoMatch, and CCSSL. Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data. Our evaluation includes five FER datasets plus one large face dataset for unconstrained learning. Our results demonstrate that FixMatch consistently achieves better performance on in-distribution unlabelled data, while ReMixMatch stands out among all methods for out-of-distribution, unconstrained, and scarce unlabelled data scenarios. Another significant observation is that with an equal number of labelled samples, semi-supervised learning delivers a considerable improvement over supervised learning, regardless of whether the unlabelled data is in-distribution, out-of-distribution, or unconstrained. We also conduct sensitivity analyses on critical hyper-parameters for the two best methods of each setting. To facilitate reproducibility and further development, we make our code publicly available at: github.com/ShuvenduRoy/SSL_FER_OOD.
Paper Structure (46 sections, 10 equations, 9 figures, 8 tables)

This paper contains 46 sections, 10 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Semi-supervised FER under ID, OOD, and unconstrained unlabelled data scenarios.
  • Figure 2: Overview of the semi-supervised learning methods explored in this study. Here, $\textit{Aug}_\textit{i}$, S. Aug, W. Aug and MixUp refer to the ith augmentation of the input $x$, a strongly augmented image, a weekly augmented image, and an augmented image with MixUp operation. Consistency Regularization is different across methods, as defined in Eqs. \ref{['eq_pimodel']}, \ref{['eq_meanteacher']}, \ref{['eq_uda']}, and \ref{['eq_vat']} for Pi-Model, Mean Teacher, UDA, and VAT, respectively. EMA refers to the exponential moving average. Adv. Per. refers to adversarial perturbation. $H(p, q)$ is the cross-entropy loss. Dist. Align is the distribution alignment concept introduced in ReMixMatch. Curriculum pseudo-labels are generated by the concept of adaptive threshold in FlexMatch. Mem. Smooth P. labelling is the concept of memory-smoothed pseudo-labels introduced in CoMatch.
  • Figure 3: Sample images from three main FER datasets: FER13, RAF-DB, and AffectNet.
  • Figure 4: Examples of hard augmentations.
  • Figure 5: Sensitivity study of various parameters for two of the best semi-supervised methods on ID unlabelled data.
  • ...and 4 more figures