Table of Contents
Fetching ...

A Perception CNN for Facial Expression Recognition

Chunwei Tian, Jingyuan Xie, Lingjun Li, Wangmeng Zuo, Yanning Zhang, David Zhang

TL;DR

This work introduces a Perception CNN (PCNN) for facial expression recognition that jointly learns global facial structure and five local sense-organ regions. Through the Facial Segmentation Information Extraction Block (FSIEB) and the Multi-domain Interaction Mechanism (MDIM), PCNN registers and fuses local and global features, guided by a two-phase facial semantic loss to stabilize learning. Extensive experiments across CK+, JAFFE, FER2013, FERPlus, RAF-DB, and occlusion/pose datasets demonstrate state-competitive performance and robustness to challenging real-world conditions, with efficient inference suitable for deployment. The approach advances FER by integrating semantic guidance from facial regions with global cues, improving fine-grained expression recognition in complex scenes.

Abstract

Convolutional neural networks (CNNs) can automatically learn data patterns to express face images for facial expression recognition (FER). However, they may ignore effect of facial segmentation of FER. In this paper, we propose a perception CNN for FER as well as PCNN. Firstly, PCNN can use five parallel networks to simultaneously learn local facial features based on eyes, cheeks and mouth to realize the sensitive capture of the subtle changes in FER. Secondly, we utilize a multi-domain interaction mechanism to register and fuse between local sense organ features and global facial structural features to better express face images for FER. Finally, we design a two-phase loss function to restrict accuracy of obtained sense information and reconstructed face images to guarantee performance of obtained PCNN in FER. Experimental results show that our PCNN achieves superior results on several lab and real-world FER benchmarks: CK+, JAFFE, FER2013, FERPlus, RAF-DB and Occlusion and Pose Variant Dataset. Its code is available at https://github.com/hellloxiaotian/PCNN.

A Perception CNN for Facial Expression Recognition

TL;DR

This work introduces a Perception CNN (PCNN) for facial expression recognition that jointly learns global facial structure and five local sense-organ regions. Through the Facial Segmentation Information Extraction Block (FSIEB) and the Multi-domain Interaction Mechanism (MDIM), PCNN registers and fuses local and global features, guided by a two-phase facial semantic loss to stabilize learning. Extensive experiments across CK+, JAFFE, FER2013, FERPlus, RAF-DB, and occlusion/pose datasets demonstrate state-competitive performance and robustness to challenging real-world conditions, with efficient inference suitable for deployment. The approach advances FER by integrating semantic guidance from facial regions with global cues, improving fine-grained expression recognition in complex scenes.

Abstract

Convolutional neural networks (CNNs) can automatically learn data patterns to express face images for facial expression recognition (FER). However, they may ignore effect of facial segmentation of FER. In this paper, we propose a perception CNN for FER as well as PCNN. Firstly, PCNN can use five parallel networks to simultaneously learn local facial features based on eyes, cheeks and mouth to realize the sensitive capture of the subtle changes in FER. Secondly, we utilize a multi-domain interaction mechanism to register and fuse between local sense organ features and global facial structural features to better express face images for FER. Finally, we design a two-phase loss function to restrict accuracy of obtained sense information and reconstructed face images to guarantee performance of obtained PCNN in FER. Experimental results show that our PCNN achieves superior results on several lab and real-world FER benchmarks: CK+, JAFFE, FER2013, FERPlus, RAF-DB and Occlusion and Pose Variant Dataset. Its code is available at https://github.com/hellloxiaotian/PCNN.

Paper Structure

This paper contains 15 sections, 4 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: The collected facial expression images in RAF-DB dataset (a) human face with occlusion (b) human face with pose changes.
  • Figure 2: Network archiecture of PCNN.
  • Figure 3: Sketch figures of a facial region segmentation.
  • Figure 4: Seven expression facial images from the RAF-DB dataset.
  • Figure 5: Seven expression facial images from the CK+ dataset.
  • ...and 7 more figures