Efficient Facial Landmark Detection for Embedded Systems
Ji-Jia Wu
TL;DR
This work targets robust, low-power facial landmark detection on edge devices by introducing the Efficient Facial Landmark Detection (EFLD) architecture. It combines an efficient backbone built from Efficient-OSA modules, a flexible multi-head detection system for different landmark formats, and a cross-format training strategy that leverages heterogeneous public datasets without increasing inference cost. Key contributions include the EOSA-based lightweight backbone, a modular detection-head design allowing 51/68/98-point formats, and a data-augmentation strategy that preserves efficiency while enhancing generalization. Empirically, EFLD achieves top performance and energy efficiency in the IEEE ICME 2024 Grand Challenges PAIR Competition, demonstrating strong potential for real-world embedded deployments with int8 quantization and deployment via a lightweight TFLite/pipeline.
Abstract
This paper introduces the Efficient Facial Landmark Detection (EFLD) model, specifically designed for edge devices confronted with the challenges related to power consumption and time latency. EFLD features a lightweight backbone and a flexible detection head, each significantly enhancing operational efficiency on resource-constrained devices. To improve the model's robustness, we propose a cross-format training strategy. This strategy leverages a wide variety of publicly accessible datasets to enhance the model's generalizability and robustness, without increasing inference costs. Our ablation study highlights the significant impact of each component on reducing computational demands, model size, and improving accuracy. EFLD demonstrates superior performance compared to competitors in the IEEE ICME 2024 Grand Challenges PAIR Competition, a contest focused on low-power, efficient, and accurate facial-landmark detection for embedded systems, showcasing its effectiveness in real-world facial landmark detection tasks.
