Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection
Shunxin Chen, Ajian Liu, Junze Zheng, Jun Wan, Kailai Peng, Sergio Escalera, Zhen Lei
TL;DR
This work tackles the challenge of unified detection for physical and digital face attacks, where live and fake faces are hard to separate due to large intra-class variation. It introduces MoAE-CR, combining a Soft Mixture of Experts integrated into a CLIP-based image encoder with two class-regularization modules, DM and CDM, to promote intra-class cohesion and inter-class separation while weighting distant, hard examples more heavily. The method trains with a CLIP objective plus the DM and CDM losses, and during inference MoAEs adaptively route features to specialized experts. Extensive experiments on UniAttackData and JFSFDB show state-of-the-art performance and strong generalization to unseen attacks, with ablations confirming the synergistic benefits of DM and CDM.
Abstract
Facial recognition systems in real-world scenarios are susceptible to both digital and physical attacks. Previous methods have attempted to achieve classification by learning a comprehensive feature space. However, these methods have not adequately accounted for the inherent characteristics of physical and digital attack data, particularly the large intra class variation in attacks and the small inter-class variation between live and fake faces. To address these limitations, we propose the Fine-Grained MoE with Class-Aware Regularization CLIP framework (FG-MoE-CLIP-CAR), incorporating key improvements at both the feature and loss levels. At the feature level, we employ a Soft Mixture of Experts (Soft MoE) architecture to leverage different experts for specialized feature processing. Additionally, we refine the Soft MoE to capture more subtle differences among various types of fake faces. At the loss level, we introduce two constraint modules: the Disentanglement Module (DM) and the Cluster Distillation Module (CDM). The DM enhances class separability by increasing the distance between the centers of live and fake face classes. However, center-to-center constraints alone are insufficient to ensure distinctive representations for individual features. Thus, we propose the CDM to further cluster features around their respective class centers while maintaining separation from other classes. Moreover, specific attacks that significantly deviate from common attack patterns are often overlooked. To address this issue, our distance calculation prioritizes more distant features. Experimental results on two unified physical-digital attack datasets demonstrate that the proposed method achieves state-of-the-art (SOTA) performance.
