Open-Set Facial Expression Recognition

Yuhang Zhang; Yue Yao; Xuannan Liu; Lixiong Qin; Wenjing Wang; Weihong Deng

Open-Set Facial Expression Recognition

Yuhang Zhang, Yue Yao, Xuannan Liu, Lixiong Qin, Wenjing Wang, Weihong Deng

TL;DR

This work defines open-set facial expression recognition (FER) to detect unseen expressions while preserving closed-set FER accuracy. It reveals that open-set samples in FER produce pseudo labels distributed across all known classes, akin to symmetric noisy labels, and reframes detection as noisy-label identification augmented by attention-map consistency and cycle training. The proposed pipeline—grounded in pseudo-labels, attention consistency, and cyclic learning—consistently outperforms state-of-the-art open-set methods on RAF-DB, FERPlus, and AffectNet, and supports online single-sample deployment. Analyses of loss distributions, pseudo-label spread, and feature separation provide mechanistic insight into why the method works and its robustness to hyperparameters. The paper illustrates a meaningful link between open-set recognition and noisy-label learning in FER with practical implications for real-world deployment.

Abstract

Facial expression recognition (FER) models are typically trained on datasets with a fixed number of seven basic classes. However, recent research works point out that there are far more expressions than the basic ones. Thus, when these models are deployed in the real world, they may encounter unknown classes, such as compound expressions that cannot be classified into existing basic classes. To address this issue, we propose the open-set FER task for the first time. Though there are many existing open-set recognition methods, we argue that they do not work well for open-set FER because FER data are all human faces with very small inter-class distances, which makes the open-set samples very similar to close-set samples. In this paper, we are the first to transform the disadvantage of small inter-class distance into an advantage by proposing a new way for open-set FER. Specifically, we find that small inter-class distance allows for sparsely distributed pseudo labels of open-set samples, which can be viewed as symmetric noisy labels. Based on this novel observation, we convert the open-set FER to a noisy label detection problem. We further propose a novel method that incorporates attention map consistency and cycle training to detect the open-set samples. Extensive experiments on various FER datasets demonstrate that our method clearly outperforms state-of-the-art open-set recognition methods by large margins. Code is available at https://github.com/zyh-uaiaaaa.

Open-Set Facial Expression Recognition

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 7 figures, 5 tables)

This paper contains 21 sections, 3 equations, 7 figures, 5 tables.

Introduction
Related Work
Facial Expression Recognition
Open-Set Recognition
Problem Definition
Method
Pipeline
Novelty and Contribution
Experiments
Datasets
Implementation Details
Open-Set FER With One or Several Basic Classes
Compound Classes and Different Classes
Online Application for One Given Sample
Further Analyses
...and 6 more sections

Figures (7)

Figure 1: We show the extracted features using CLIP on CIFAR-10 and RAF-DB. CIFAR-10 (RAF-DB) has a large (small) inter-class distance. The small inter-class distance of FER data makes open-set samples similar to close-set samples and degrades the performance of the SOTA open-set recognition method DIAS from 0.850 to 0.714. Our method outperforms DIAS by large margins (over $20\%$ improvement based on the original AUROC) on the open-set FER task of three different FER datasets.
Figure 2: We provide an illustration of our motivation by showing the predicted pseudo labels of the close-set model on CIFAR-10 and FER datasets. CIFAR-10 has relatively large inter-class distances, and the close-set trained model predicts unknown samples into the most similar known class. For example, if the unknown class is 'cat', the trained model will predict almost all cat samples into the known class 'dog'. However, FER data are all human faces. The close-set trained FER model predicts samples of one unknown class to all known classes, which is similar to the concept of symmetric noisy label - a type of easy label noise commonly encountered in the noisy label field.
Figure 3: The pipeline of our method. Given the input of both close-set and open-set samples, we utilize the trained close-set model to generate pseudo labels for them. Open-set samples will get noisy close-set labels. We then cyclically train two FER models from scratch with the pseudo labels and utilize attention map consistency loss (Cons.) to prevent the model from memorizing the noisy close-set labels. Each model selects clean samples for another model and teaches each other cyclically. We also utilize a cyclical learning rate (lr) to create an ensemble of models for better separation of close-set and open-set samples. After training, the open-set samples have large classification (Cls.) loss while close-set samples have small Cls. loss.
Figure 4: Confidence scores of different methods. AUROC of each method is marked below. The baseline method fails as FER data have small inter-class distances, making open-set data have the same range of confidence scores as close-set data. Close-set and open-set data are separated by DIAS and PROSER while they still overlap a lot. Our method transforms open-set FER to noisy label detection and effectively separates close-set and open-set samples.
Figure 5: Hyperparameters analyses of our method on ResNet-$18$ and ResNet-$50$. ResNet-$50$ generally has better performance than ResNet-$18$. Our method is not sensitive to hyperparameters as AUROC slightly changes from $0.87$ to $0.93$. The best consistency weight is $5$ and the best training epoch number is $40$.
...and 2 more figures

Open-Set Facial Expression Recognition

TL;DR

Abstract

Open-Set Facial Expression Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (7)