Table of Contents
Fetching ...

Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation

Thoa Thieu, Roderick Melnik

TL;DR

The paper investigates the relative performance of Extended Kalman Filters (EKF) and Unscented Kalman Filters (UKF) for 3D facial landmark tracking in head pose estimation, examining both deterministic and stochastic settings. It formulates both filters for a 54-landmark facial model and evaluates their accuracy on a public dataset, highlighting that UKF better captures nonlinearities in noise-free scenarios while EKF demonstrates greater robustness under noisy, real-world conditions. Key contributions include a detailed, section-by-section description of EKF and UKF implementations, a comparative mean-squared-error analysis, and practical guidance on when to prefer each filter or employ hybrids. The findings inform the design of real-time facial tracking and pose estimation systems, suggesting adaptive strategies and pre-processing to improve robustness in challenging environments.

Abstract

Facial landmark tracking plays a vital role in applications such as facial recognition, expression analysis, and medical diagnostics. In this paper, we consider the performance of the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) in tracking 3D facial motion in both deterministic and stochastic settings. We first analyze a noise-free environment where the state transition is purely deterministic, demonstrating that UKF outperforms EKF by achieving lower mean squared error (MSE) due to its ability to capture higher-order nonlinearities. However, when stochastic noise is introduced, EKF exhibits superior robustness, maintaining lower mean square error (MSE) compared to UKF, which becomes more sensitive to measurement noise and occlusions. Our results highlight that UKF is preferable for high-precision applications in controlled environments, whereas EKF is better suited for real-world scenarios with unpredictable noise. These findings provide practical insights for selecting the appropriate filtering technique in 3D facial tracking applications, such as motion capture and facial recognition.

Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation

TL;DR

The paper investigates the relative performance of Extended Kalman Filters (EKF) and Unscented Kalman Filters (UKF) for 3D facial landmark tracking in head pose estimation, examining both deterministic and stochastic settings. It formulates both filters for a 54-landmark facial model and evaluates their accuracy on a public dataset, highlighting that UKF better captures nonlinearities in noise-free scenarios while EKF demonstrates greater robustness under noisy, real-world conditions. Key contributions include a detailed, section-by-section description of EKF and UKF implementations, a comparative mean-squared-error analysis, and practical guidance on when to prefer each filter or employ hybrids. The findings inform the design of real-time facial tracking and pose estimation systems, suggesting adaptive strategies and pre-processing to improve robustness in challenging environments.

Abstract

Facial landmark tracking plays a vital role in applications such as facial recognition, expression analysis, and medical diagnostics. In this paper, we consider the performance of the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) in tracking 3D facial motion in both deterministic and stochastic settings. We first analyze a noise-free environment where the state transition is purely deterministic, demonstrating that UKF outperforms EKF by achieving lower mean squared error (MSE) due to its ability to capture higher-order nonlinearities. However, when stochastic noise is introduced, EKF exhibits superior robustness, maintaining lower mean square error (MSE) compared to UKF, which becomes more sensitive to measurement noise and occlusions. Our results highlight that UKF is preferable for high-precision applications in controlled environments, whereas EKF is better suited for real-world scenarios with unpredictable noise. These findings provide practical insights for selecting the appropriate filtering technique in 3D facial tracking applications, such as motion capture and facial recognition.

Paper Structure

This paper contains 23 sections, 26 equations, 10 figures.

Figures (10)

  • Figure 1: [Color online] Illustration of the database for the first user. Top left panel: The man moves his face to the right. Top right panel: The man returns to the initial position, moving from the right. Middle left panel: The man moves his face to the left. Middle right panel: The man holds his face vertically and moves horizontally to the right. Bottom left panel: The man holds his face vertically and moves horizontally to the left. Bottom right panel: The man moves his face upward. The screenshots used in this illustration are provided in videos from the dataset in Ariz2016novel, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • Figure 2: [Color online] Illustration of the 54 facial landmarks automatically annotated for the dataset provided in Ariz2016novel, showing both the anatomical placement (left) and numerical order (right) of the landmarks. This figure was created using Leonardo AI, Ibis Paint X, and Freepik Retouch.
  • Figure 3: [Color online] Comparison of the MSE between the EKF and UKF for deterministic 3D facial motion tracking over 12 time-frames for users 1,2,3,4 (panels from the left to the right and from the top to the bottom). The experiment tracks 54 facial points (each with X, Y, and Z coordinates), using a deterministic process model with no process noise and a direct measurement model. The UKF, leveraging sigma points for nonlinear state estimation, demonstrates lower MSE compared to the EKF, which relies on linearization. The dataset used in this simulation is provided in Ariz2016novel, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • Figure 4: [Color online] Comparison of the MSE between the EKF and UKF for deterministic 3D facial motion tracking over 12 time-frames for users 5,6,7,8 (panels from the left to the right and from the top to the bottom). The experiment tracks 54 facial points (each with X, Y, and Z coordinates), using a deterministic process model with no process noise and a direct measurement model. The UKF, leveraging sigma points for nonlinear state estimation, demonstrates lower MSE compared to the EKF, which relies on linearization. The dataset used in this simulation is provided in Ariz2016novel, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • Figure 5: [Color online] Comparison of real and estimated 3D facial landmark coordinates $x,y,z$ over the time frames for 4 selected points for user 1. The real data is plotted alongside the estimated values obtained using the deterministic UKF and EKF. Each subplot corresponds to a specific coordinate ($x, y$ or $z$), showing temporal variations across frames. The dataset used in this simulation is provided in Ariz2016novel, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • ...and 5 more figures