Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection

Haoze Li; Jie Zhang; Guoying Zhao; Stephen Lin; Shiguang Shan

Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection

Haoze Li, Jie Zhang, Guoying Zhao, Stephen Lin, Shiguang Shan

TL;DR

This work tackles the challenge of deploying robust face presentation attack detection under rehearsal-free incremental learning, constrained by privacy regulations that prevent storing past data. It introduces SVLP-IL, a CLIP-based framework that steers vision-language pre-trained models via Multi-Aspect Prompting (MAP) and Selective Elastic Weight Consolidation (SEWC) to balance plasticity and stability without data replay. MAP provides domain-specific and universal cues through visual and textual prompts, while SEWC protects critical backbone weights by selectively consolidating past knowledge with a Bayesian-inspired Fisher-based penalty. Extensive experiments across nine PAD benchmarks demonstrate reduced forgetting and strong generalization to unseen domains, offering a practical, privacy-conscious approach to lifelong PAD deployment.

Abstract

Face Presentation Attack Detection (PAD) demands incremental learning (IL) to combat evolving spoofing tactics and domains. Privacy regulations, however, forbid retaining past data, necessitating rehearsal-free IL (RF-IL). Vision-Language Pre-trained (VLP) models, with their prompt-tunable cross-modal representations, enable efficient adaptation to new spoofing styles and domains. Capitalizing on this strength, we propose \textbf{SVLP-IL}, a VLP-based RF-IL framework that balances stability and plasticity via \textit{Multi-Aspect Prompting} (MAP) and \textit{Selective Elastic Weight Consolidation} (SEWC). MAP isolates domain dependencies, enhances distribution-shift sensitivity, and mitigates forgetting by jointly exploiting universal and domain-specific cues. SEWC selectively preserves critical weights from previous tasks, retaining essential knowledge while allowing flexibility for new adaptations. Comprehensive experiments across multiple PAD benchmarks show that SVLP-IL significantly reduces catastrophic forgetting and enhances performance on unseen domains. SVLP-IL offers a privacy-compliant, practical solution for robust lifelong PAD deployment in RF-IL settings.

Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection

TL;DR

Abstract

Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)