Table of Contents
Fetching ...

Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack Detection

Jiabao Guo, Yadian Wang, Hui Ma, Yuhao Fu, Ju Jia, Hui Liu, Shengeng Tang, Lechao Cheng, Yunfeng Diao, Ajian Liu

TL;DR

The paper tackles unified facial attack detection by introducing SPL-UAD, a spoofing-aware prompt learning framework that decouples optimization for physical and digital attacks within the CLIP prompt space. It introduces Spoofing Context Prompt Generation to produce attack-aware contexts and Cues-awareness Augmentation to mine hard samples, all with minimal parameter overhead. Evaluations on UniAttackDataPlus show competitive improvements in ACER and AUC, with ablations confirming the complementary benefits of SCPG and CAA. The approach offers a practical, transfer-friendly pathway toward robust biometric security across diverse sensors and unseen attacks.

Abstract

Real-world face recognition systems are vulnerable to both physical presentation attacks (PAs) and digital forgery attacks (DFs). We aim to achieve comprehensive protection of biometric data by implementing a unified physical-digital defense framework with advanced detection. Existing approaches primarily employ CLIP with regularization constraints to enhance model generalization across both tasks. However, these methods suffer from conflicting optimization directions between physical and digital attack detection under same category prompt spaces. To overcome this limitation, we propose a Spoofing-aware Prompt Learning for Unified Attack Detection (SPL-UAD) framework, which decouples optimization branches for physical and digital attacks in the prompt space. Specifically, we construct a learnable parallel prompt branch enhanced with adaptive Spoofing Context Prompt Generation, enabling independent control of optimization for each attack type. Furthermore, we design a Cues-awareness Augmentation that leverages the dual-prompt mechanism to generate challenging sample mining tasks on data, significantly enhancing the model's robustness against unseen attack types. Extensive experiments on the large-scale UniAttackDataPlus dataset demonstrate that the proposed method achieves significant performance improvements in unified attack detection tasks.

Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack Detection

TL;DR

The paper tackles unified facial attack detection by introducing SPL-UAD, a spoofing-aware prompt learning framework that decouples optimization for physical and digital attacks within the CLIP prompt space. It introduces Spoofing Context Prompt Generation to produce attack-aware contexts and Cues-awareness Augmentation to mine hard samples, all with minimal parameter overhead. Evaluations on UniAttackDataPlus show competitive improvements in ACER and AUC, with ablations confirming the complementary benefits of SCPG and CAA. The approach offers a practical, transfer-friendly pathway toward robust biometric security across diverse sensors and unseen attacks.

Abstract

Real-world face recognition systems are vulnerable to both physical presentation attacks (PAs) and digital forgery attacks (DFs). We aim to achieve comprehensive protection of biometric data by implementing a unified physical-digital defense framework with advanced detection. Existing approaches primarily employ CLIP with regularization constraints to enhance model generalization across both tasks. However, these methods suffer from conflicting optimization directions between physical and digital attack detection under same category prompt spaces. To overcome this limitation, we propose a Spoofing-aware Prompt Learning for Unified Attack Detection (SPL-UAD) framework, which decouples optimization branches for physical and digital attacks in the prompt space. Specifically, we construct a learnable parallel prompt branch enhanced with adaptive Spoofing Context Prompt Generation, enabling independent control of optimization for each attack type. Furthermore, we design a Cues-awareness Augmentation that leverages the dual-prompt mechanism to generate challenging sample mining tasks on data, significantly enhancing the model's robustness against unseen attack types. Extensive experiments on the large-scale UniAttackDataPlus dataset demonstrate that the proposed method achieves significant performance improvements in unified attack detection tasks.

Paper Structure

This paper contains 15 sections, 5 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed SPL-UAD framework. (a) Spoofing-aware Prompt Learning. An image is tokenized into patch embeddings and processed by the frozen CLIP iamge encoder, while text tokens are fed into the frozen text encoder. We inject learnable prompts together with spoof-aware context at multiple transformer layers. A dual-branch design decouples optimization for physical and digital attacks, mitigating conflicting gradients and preserving attack-specific cues. Cross-modal similarities between the resulting visual and textual features are used for classification, and representations are further organized by K-Means to support context construction. (b) Spoofing Context Prompt Generation (SCPG). We cluster class-level embeddings to obtain centers and apply lightweight linear projections to yield textual and visual context that align with encoder hidden sizes. Multi-Granularity Spoof-Aware descriptions enrich semantics for both real and spoof classes. The combined design provides informative pre-context, promotes stable text–image interactions, and enables cues-awareness augmentation to mine hard examples, ultimately improving robustness to both physical and digital attacks.
  • Figure 2: The Samples of UniAttackDataPlus Dataset liu2025benchmarking. Samples from UniAttackDataPlus covering both physical and digital attack families. Physical attacks include 2D prints, replay videos, cutouts, and diverse 3D masks such as transparent shields, resin, and plaster, collected under varied sensors and environments. Digital attacks span pixel-level manipulations and semantic-level edits, with ID-consistent pairing to real subjects. The breadth of capture conditions and the hierarchical taxonomy of attack types encourage models to focus on spoof-specific cues rather than incidental correlations, enabling unified evaluation across diverse threats.