Table of Contents
Fetching ...

Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

Shivansh Chandra Tripathi, Rahul Garg

TL;DR

This work tackles the labor-intensive process of manually defining facial Action Units by proposing an unsupervised, data-driven facial coding system (DFECS) learned from facial keypoint motion. It introduces a Full Face Model (FFM) that combines Dictionary Learning (DL) and Non-negative Matrix Factorization (NMF) in a two-level decomposition (PFM and HFM) to yield sparse, positive, interpretable AUs, with final representations $U' = UA$ and encodings $V$. On DISFA, CK+, and BP4D-Spontaneous, DFECS achieves variance explained on test data comparable to PCA AUs, while achieving significantly higher interpretability (87.5% of 16 AUs interpretable), and reaches up to 91.29% VE on test sets. The approach enables scalable, automated facial expression analysis with broad potential applications, and the authors provide public code to facilitate further research.

Abstract

The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facial coding system called the Data-driven Facial Expression Coding System (DFECS), the AUs are estimated by applying dimensionality reduction to facial keypoint movements from a neutral frame through a proposed Full Face Model (FFM). FFM employs a two-level decomposition using advanced dimensionality reduction techniques such as dictionary learning (DL) and non-negative matrix factorization (NMF). These techniques enhance the interpretability of AUs by introducing constraints such as sparsity and positivity to the encoding matrix. Results show that DFECS AUs estimated from the DISFA dataset can account for an average variance of up to 91.29 percent in test datasets (CK+ and BP4D-Spontaneous) and also surpass the variance explained by keypoint-based equivalents of FACS AUs in these datasets. Additionally, 87.5 percent of DFECS AUs are interpretable, i.e., align with the direction of facial muscle movements. In summary, advancements in automated facial coding systems can accelerate facial expression analysis across diverse fields such as security, healthcare, and entertainment. These advancements offer numerous benefits, including enhanced detection of abnormal behavior, improved pain analysis in healthcare settings, and enriched emotion-driven interactions. To facilitate further research, the code repository of DFECS has been made publicly accessible.

Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

TL;DR

This work tackles the labor-intensive process of manually defining facial Action Units by proposing an unsupervised, data-driven facial coding system (DFECS) learned from facial keypoint motion. It introduces a Full Face Model (FFM) that combines Dictionary Learning (DL) and Non-negative Matrix Factorization (NMF) in a two-level decomposition (PFM and HFM) to yield sparse, positive, interpretable AUs, with final representations and encodings . On DISFA, CK+, and BP4D-Spontaneous, DFECS achieves variance explained on test data comparable to PCA AUs, while achieving significantly higher interpretability (87.5% of 16 AUs interpretable), and reaches up to 91.29% VE on test sets. The approach enables scalable, automated facial expression analysis with broad potential applications, and the authors provide public code to facilitate further research.

Abstract

The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facial coding system called the Data-driven Facial Expression Coding System (DFECS), the AUs are estimated by applying dimensionality reduction to facial keypoint movements from a neutral frame through a proposed Full Face Model (FFM). FFM employs a two-level decomposition using advanced dimensionality reduction techniques such as dictionary learning (DL) and non-negative matrix factorization (NMF). These techniques enhance the interpretability of AUs by introducing constraints such as sparsity and positivity to the encoding matrix. Results show that DFECS AUs estimated from the DISFA dataset can account for an average variance of up to 91.29 percent in test datasets (CK+ and BP4D-Spontaneous) and also surpass the variance explained by keypoint-based equivalents of FACS AUs in these datasets. Additionally, 87.5 percent of DFECS AUs are interpretable, i.e., align with the direction of facial muscle movements. In summary, advancements in automated facial coding systems can accelerate facial expression analysis across diverse fields such as security, healthcare, and entertainment. These advancements offer numerous benefits, including enhanced detection of abnormal behavior, improved pain analysis in healthcare settings, and enriched emotion-driven interactions. To facilitate further research, the code repository of DFECS has been made publicly accessible.
Paper Structure (26 sections, 4 equations, 10 figures, 5 tables)

This paper contains 26 sections, 4 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Pictorial representation from preprocessing to estimating the final DFECS AUs.
  • Figure 1: Generation of a projected AU image on a face. Red keypoints indicate the neutral image's facial keypoints, green ones represent the expression frame keypoints, and arrows illustrate how the keypoints move from the neutral position to the corresponding expression image. (Face image source: BP4D-Spontaneous zhang2013highzhang2014bp4d)
  • Figure 2: The location of 68 facial keypoints on a subject. Yellow keypoints are employed for registration using affine transformation.
  • Figure 2: Performance of PCA AUs, DFECS AUs, pure AUs, and comb AUs on the test datasets - CK+ and BP4D-Spontaneous with increasing L1-norm of the encoding matrix. (x-axis on the log scale)
  • Figure 3: Generation and visualization of a KPM on a face image. Red keypoints indicate the neutral image's facial keypoints, green ones represent the expression frame keypoints, and arrows illustrate how the keypoints move from the neutral position to the corresponding expression image represented by the KPM. (Face image source: BP4D-Spontaneous zhang2013highzhang2014bp4d)
  • ...and 5 more figures