Domain Generalization for Face Anti-spoofing via Content-aware Composite Prompt Engineering
Jiabao Guo, Ajian Liu, Yunfeng Diao, Jin Zhang, Hui Ma, Bo Zhao, Richang Hong, Meng Wang
TL;DR
Domain Generalization in Face Anti-Spoofing is tackled by CCPE, which replaces semantics-poor class prompts with instance-aware prompts derived from an instruction-based LLM and a learnable Q-Former branch, coupled with a Cross-modal Guidance Module to fuse language and vision. This approach yields state-of-the-art generalization on cross-domain FAS benchmarks and is supported by ablations showing the value of each component. The method demonstrates that content-aware prompts and multimodal guidance can mitigate domain shifts without target-domain data. The work offers a practical, scalable pathway for robust FAS in real-world, diverse capture settings by leveraging rich semantic information from LLMs and adaptable visual prompts.
Abstract
The challenge of Domain Generalization (DG) in Face Anti-Spoofing (FAS) is the significant interference of domain-specific signals on subtle spoofing clues. Recently, some CLIP-based algorithms have been developed to alleviate this interference by adjusting the weights of visual classifiers. However, our analysis of this class-wise prompt engineering suffers from two shortcomings for DG FAS: (1) The categories of facial categories, such as real or spoof, have no semantics for the CLIP model, making it difficult to learn accurate category descriptions. (2) A single form of prompt cannot portray the various types of spoofing. In this work, instead of class-wise prompts, we propose a novel Content-aware Composite Prompt Engineering (CCPE) that generates instance-wise composite prompts, including both fixed template and learnable prompts. Specifically, our CCPE constructs content-aware prompts from two branches: (1) Inherent content prompt explicitly benefits from abundant transferred knowledge from the instruction-based Large Language Model (LLM). (2) Learnable content prompts implicitly extract the most informative visual content via Q-Former. Moreover, we design a Cross-Modal Guidance Module (CGM) that dynamically adjusts unimodal features for fusion to achieve better generalized FAS. Finally, our CCPE has been validated for its effectiveness in multiple cross-domain experiments and achieves state-of-the-art (SOTA) results.
