Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation
Thoa Thieu, Roderick Melnik
TL;DR
The paper investigates the relative performance of Extended Kalman Filters (EKF) and Unscented Kalman Filters (UKF) for 3D facial landmark tracking in head pose estimation, examining both deterministic and stochastic settings. It formulates both filters for a 54-landmark facial model and evaluates their accuracy on a public dataset, highlighting that UKF better captures nonlinearities in noise-free scenarios while EKF demonstrates greater robustness under noisy, real-world conditions. Key contributions include a detailed, section-by-section description of EKF and UKF implementations, a comparative mean-squared-error analysis, and practical guidance on when to prefer each filter or employ hybrids. The findings inform the design of real-time facial tracking and pose estimation systems, suggesting adaptive strategies and pre-processing to improve robustness in challenging environments.
Abstract
Facial landmark tracking plays a vital role in applications such as facial recognition, expression analysis, and medical diagnostics. In this paper, we consider the performance of the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) in tracking 3D facial motion in both deterministic and stochastic settings. We first analyze a noise-free environment where the state transition is purely deterministic, demonstrating that UKF outperforms EKF by achieving lower mean squared error (MSE) due to its ability to capture higher-order nonlinearities. However, when stochastic noise is introduced, EKF exhibits superior robustness, maintaining lower mean square error (MSE) compared to UKF, which becomes more sensitive to measurement noise and occlusions. Our results highlight that UKF is preferable for high-precision applications in controlled environments, whereas EKF is better suited for real-world scenarios with unpredictable noise. These findings provide practical insights for selecting the appropriate filtering technique in 3D facial tracking applications, such as motion capture and facial recognition.
