Table of Contents
Fetching ...

Robust Light-Weight Facial Affective Behavior Recognition with CLIP

Li Lin, Sarah Papabathini, Xin Wang, Shu Hu

TL;DR

This work introduces the first lightweight framework adept at efficiently tackling both expression classification and AU detection, which employs a frozen CLIP image encoder alongside a trainable multilayer perceptron (MLP), enhanced with Conditional Value at Risk for robustness and a loss landscape flattening strategy for improved generalization.

Abstract

Human affective behavior analysis aims to delve into human expressions and behaviors to deepen our understanding of human emotions. Basic expression categories (EXPR) and Action Units (AUs) are two essential components in this analysis, which categorize emotions and break down facial movements into elemental units, respectively. Despite advancements, existing approaches in expression classification and AU detection often necessitate complex models and substantial computational resources, limiting their applicability in everyday settings. In this work, we introduce the first lightweight framework adept at efficiently tackling both expression classification and AU detection. This framework employs a frozen CLIP image encoder alongside a trainable multilayer perceptron (MLP), enhanced with Conditional Value at Risk (CVaR) for robustness and a loss landscape flattening strategy for improved generalization. Experimental results on the Aff-wild2 dataset demonstrate superior performance in comparison to the baseline while maintaining minimal computational demands, offering a practical solution for affective behavior analysis. The code is available at https://github.com/Purdue-M2/Affective_Behavior_Analysis_M2_PURDUE

Robust Light-Weight Facial Affective Behavior Recognition with CLIP

TL;DR

This work introduces the first lightweight framework adept at efficiently tackling both expression classification and AU detection, which employs a frozen CLIP image encoder alongside a trainable multilayer perceptron (MLP), enhanced with Conditional Value at Risk for robustness and a loss landscape flattening strategy for improved generalization.

Abstract

Human affective behavior analysis aims to delve into human expressions and behaviors to deepen our understanding of human emotions. Basic expression categories (EXPR) and Action Units (AUs) are two essential components in this analysis, which categorize emotions and break down facial movements into elemental units, respectively. Despite advancements, existing approaches in expression classification and AU detection often necessitate complex models and substantial computational resources, limiting their applicability in everyday settings. In this work, we introduce the first lightweight framework adept at efficiently tackling both expression classification and AU detection. This framework employs a frozen CLIP image encoder alongside a trainable multilayer perceptron (MLP), enhanced with Conditional Value at Risk (CVaR) for robustness and a loss landscape flattening strategy for improved generalization. Experimental results on the Aff-wild2 dataset demonstrate superior performance in comparison to the baseline while maintaining minimal computational demands, offering a practical solution for affective behavior analysis. The code is available at https://github.com/Purdue-M2/Affective_Behavior_Analysis_M2_PURDUE
Paper Structure (6 sections, 4 equations, 2 figures, 1 table)

This paper contains 6 sections, 4 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Displayed samples from Aff-wild2 dataset. (a) Expression Classification is a multi-class task, which has 8 categories in total. (b) Action Unit Detection is a multi-label task, each image contains annotations in terms of 12 AUs.
  • Figure 2: 'Macro' F1 score to different $\alpha$ values in expression classification.