Table of Contents
Fetching ...

One-Frame Calibration with Siamese Network in Facial Action Unit Recognition

Shuangquan Feng, Virginia R. de Sa

TL;DR

This work proposes to perform one-frame calibration (OFC) in AU recognition: for each face, a single image of its neutral expression is used as the reference image for calibration, and substantially outperforms the naive OFC method of baseline subtraction as well as a fine-tuned version of this naive OFC method.

Abstract

Automatic facial action unit (AU) recognition is used widely in facial expression analysis. Most existing AU recognition systems aim for cross-participant non-calibrated generalization (NCG) to unseen faces without further calibration. However, due to the diversity of facial attributes across different identities, accurately inferring AU activation from single images of an unseen face is sometimes infeasible, even for human experts -- it is crucial to first understand how the face appears in its neutral expression, or significant bias may be incurred. Therefore, we propose to perform one-frame calibration (OFC) in AU recognition: for each face, a single image of its neutral expression is used as the reference image for calibration. With this strategy, we develop a Calibrating Siamese Network (CSN) for AU recognition and demonstrate its remarkable effectiveness with a simple iResNet-50 (IR50) backbone. On the DISFA, DISFA+, and UNBC-McMaster datasets, we show that our OFC CSN-IR50 model (a) substantially improves the performance of IR50 by mitigating facial attribute biases (including biases due to wrinkles, eyebrow positions, facial hair, etc.), (b) substantially outperforms the naive OFC method of baseline subtraction as well as (c) a fine-tuned version of this naive OFC method, and (d) also outperforms state-of-the-art NCG models for both AU intensity estimation and AU detection.

One-Frame Calibration with Siamese Network in Facial Action Unit Recognition

TL;DR

This work proposes to perform one-frame calibration (OFC) in AU recognition: for each face, a single image of its neutral expression is used as the reference image for calibration, and substantially outperforms the naive OFC method of baseline subtraction as well as a fine-tuned version of this naive OFC method.

Abstract

Automatic facial action unit (AU) recognition is used widely in facial expression analysis. Most existing AU recognition systems aim for cross-participant non-calibrated generalization (NCG) to unseen faces without further calibration. However, due to the diversity of facial attributes across different identities, accurately inferring AU activation from single images of an unseen face is sometimes infeasible, even for human experts -- it is crucial to first understand how the face appears in its neutral expression, or significant bias may be incurred. Therefore, we propose to perform one-frame calibration (OFC) in AU recognition: for each face, a single image of its neutral expression is used as the reference image for calibration. With this strategy, we develop a Calibrating Siamese Network (CSN) for AU recognition and demonstrate its remarkable effectiveness with a simple iResNet-50 (IR50) backbone. On the DISFA, DISFA+, and UNBC-McMaster datasets, we show that our OFC CSN-IR50 model (a) substantially improves the performance of IR50 by mitigating facial attribute biases (including biases due to wrinkles, eyebrow positions, facial hair, etc.), (b) substantially outperforms the naive OFC method of baseline subtraction as well as (c) a fine-tuned version of this naive OFC method, and (d) also outperforms state-of-the-art NCG models for both AU intensity estimation and AU detection.
Paper Structure (24 sections, 9 equations, 3 figures, 15 tables)

This paper contains 24 sections, 9 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: The CSN-IR50 network architecture. The example reference image and target image are from the DISFA dataset.
  • Figure 2: Examples of CSN-IR50 mitigating facial attribute biases in AU intensity estimation. In each subfigure, the left panel is the (preprocessed) reference image of the specific participant, and the right panel is the comparison between AU intensities estimated by IR50 (NCG) and CSN-IR50 (OFC) and the human expert labeled intensities in the dataset. All examples are from the left-view videos of the DISFA dataset.
  • Figure S1: A visual reference guide for the primary AUs extracted from FACSlite.