Table of Contents
Fetching ...

BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing

Yingjie Ma, Zitong Yu, Xun Lin, Weicheng Xie, Linlin Shen

TL;DR

BIG-MoE tackles generalization challenges in multimodal Face Anti-Spoofing by introducing an Isolated Gating Mechanism Adapter (IGMA) and a Convolutional Prompt Bypass (CPB) within a Mixture of Experts framework. The approach leverages fine-grained experts, noise-robust gating, and local cue-focused prompts to improve discrimination and gating stability across modalities and domains. Empirical results across four benchmarks show substantial gains in HTER and AUC, with ablations confirming the effectiveness of IGMA and CPB and their synergy. This work advances robust, cross-domain multimodal FAS with a scalable MoE-based design and provides practical guidance for generalization under limited data scenarios.

Abstract

In the domain of facial recognition security, multimodal Face Anti-Spoofing (FAS) is essential for countering presentation attacks. However, existing technologies encounter challenges due to modality biases and imbalances, as well as domain shifts. Our research introduces a Mixture of Experts (MoE) model to address these issues effectively. We identified three limitations in traditional MoE approaches to multimodal FAS: (1) Coarse-grained experts' inability to capture nuanced spoofing indicators; (2) Gated networks' susceptibility to input noise affecting decision-making; (3) MoE's sensitivity to prompt tokens leading to overfitting with conventional learning methods. To mitigate these, we propose the Bypass Isolated Gating MoE (BIG-MoE) framework, featuring: (1) Fine-grained experts for enhanced detection of subtle spoofing cues; (2) An isolation gating mechanism to counteract input noise; (3) A novel differential convolutional prompt bypass enriching the gating network with critical local features, thereby improving perceptual capabilities. Extensive experiments on four benchmark datasets demonstrate significant generalization performance improvement in multimodal FAS task. The code is released at https://github.com/murInJ/BIG-MoE.

BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing

TL;DR

BIG-MoE tackles generalization challenges in multimodal Face Anti-Spoofing by introducing an Isolated Gating Mechanism Adapter (IGMA) and a Convolutional Prompt Bypass (CPB) within a Mixture of Experts framework. The approach leverages fine-grained experts, noise-robust gating, and local cue-focused prompts to improve discrimination and gating stability across modalities and domains. Empirical results across four benchmarks show substantial gains in HTER and AUC, with ablations confirming the effectiveness of IGMA and CPB and their synergy. This work advances robust, cross-domain multimodal FAS with a scalable MoE-based design and provides practical guidance for generalization under limited data scenarios.

Abstract

In the domain of facial recognition security, multimodal Face Anti-Spoofing (FAS) is essential for countering presentation attacks. However, existing technologies encounter challenges due to modality biases and imbalances, as well as domain shifts. Our research introduces a Mixture of Experts (MoE) model to address these issues effectively. We identified three limitations in traditional MoE approaches to multimodal FAS: (1) Coarse-grained experts' inability to capture nuanced spoofing indicators; (2) Gated networks' susceptibility to input noise affecting decision-making; (3) MoE's sensitivity to prompt tokens leading to overfitting with conventional learning methods. To mitigate these, we propose the Bypass Isolated Gating MoE (BIG-MoE) framework, featuring: (1) Fine-grained experts for enhanced detection of subtle spoofing cues; (2) An isolation gating mechanism to counteract input noise; (3) A novel differential convolutional prompt bypass enriching the gating network with critical local features, thereby improving perceptual capabilities. Extensive experiments on four benchmark datasets demonstrate significant generalization performance improvement in multimodal FAS task. The code is released at https://github.com/murInJ/BIG-MoE.

Paper Structure

This paper contains 11 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Existing MoE prompt learning paradigm vs. Ours. Both (a) conventional MoE prompt learning and (b) the parameter-efficient expert retrieval he2024mixture approaches input prompt and feature tokens into the gating network or product key gating (PK Gate) network to generate scores for expert selection and subsequent processing with different gating mechanisms and types of experts. (c) Our Isolated Gating Mechanism (IGM) concatenates prompt and feature tokens for gating network scoring, and then processes feature tokens exclusively, isolating expert network input to enhance noise resilience and processing precision.
  • Figure 2: BIG-MoE Framework Overview: The diagram succinctly captures the essential process and components of our approach: (a) Prompt Generation: This step outlines the creation and integration of initial prompts. (b) CPB: Describes the Convolutional Prompt Bypass, focusing on its enhancement of feature extraction via Central Difference Convolution (CDC) yu2020multi and multimodal prompt integration. (c) IGMA: Highlights the Isolated Gating Mechanism Adapter's role in gating and its interaction with CPB across layers, promoting information exchange for enhanced model performance and robustness.
  • Figure 3: Ablation study on expert numbers and activations. (a) HTER with Varying Numbers of Activated Experts. (b) HTER with Different Total Expert Counts. The ablation study investigates the impact of expert count and activation on model performance, providing insights into the optimal configuration for expert utilization in the model.
  • Figure 4: t-SNE visualization when respectively tested on CeFA, PADISI, SURF, and WMCA domains.