Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Zitong Yu, Wenzhong Tang, Alex Kot
TL;DR
This paper tackles the challenge of generalizing multi-modal face anti-spoofing (FAS) to unseen environments by addressing modality unreliability during cross-modal fusion and modality imbalance across modalities. It introduces MMDG, a Vision Transformer–based framework with Uncertainty-Guided Cross-Adapters (U-Adapter) to suppress unreliable information and Rebalanced Modality Gradient Modulation (ReGrad) to balance modality convergence, complemented by a Single-Side Prototypical Loss to align domain prototypes. A first large-scale benchmark for multi-modal DG in FAS is proposed, spanning four datasets (CASIA-CeFA, PADISI-Face, CASIA-SURF, WMCA) and three evaluation protocols, with experiments showing state-of-the-art improvements over existing DG and multi-modal methods. The work provides practical insights for deploying robust multi-modal FAS systems under domain shifts and establishes a challenging benchmark to spur further progress, with code and protocols to be released.
Abstract
Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. These challenges arise from (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion, and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality. To address modality unreliability, we propose the Uncertainty-Guided Cross-Adapter (U-Adapter) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities. For modality imbalance, we propose a Rebalanced Modality Gradient Modulation (ReGrad) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients. Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on https://github.com/OMGGGGG/mmdg.
