Fiducial Focus Augmentation for Facial Landmark Detection
Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian
TL;DR
This work tackles facial landmark detection under challenging conditions by introducing Fiducial Focus Augmentation (FiFA), a patch-based augmentation that places black squares around landmark fiducials to embed facial structure as an inductive bias. It couples FiFA with a Siamese training scheme using Deep Canonical Correlation Analysis (DCCA) loss to enforce cross-view consistency between two augmented views, while employing a Transformer+CNN backbone (ViT-B/16 with anti-aliased hourglass modules and an FF-Parser) for robust heatmap-based landmark regression. The method demonstrates state-of-the-art performance on COFW, 300W, and AFLW, supported by extensive ablations showing the contributions of FiFA, DCCA, and the architectural components. Overall, FiFA enhances FLD robustness to pose, illumination, and occlusion, with potential applicability to other face-related tasks.
Abstract
Deep learning methods have led to significant improvements in the performance on the facial landmark detection (FLD) task. However, detecting landmarks in challenging settings, such as head pose changes, exaggerated expressions, or uneven illumination, continue to remain a challenge due to high variability and insufficient samples. This inadequacy can be attributed to the model's inability to effectively acquire appropriate facial structure information from the input images. To address this, we propose a novel image augmentation technique specifically designed for the FLD task to enhance the model's understanding of facial structures. To effectively utilize the newly proposed augmentation technique, we employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to achieve collective learning of high-level feature representations from two different views of the input images. Furthermore, we employ a Transformer + CNN-based network with a custom hourglass module as the robust backbone for the Siamese framework. Extensive experiments show that our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
