Advancing Cross-Domain Generalizability in Face Anti-Spoofing: Insights, Design, and Metrics
Hyojin Kim, Jiyoon Lee, Yonghyun Jeong, Haneol Jang, YoungJoon Yoo
TL;DR
The paper tackles cross-domain generalization in face anti-spoofing (FAS) for video inputs, noting that frame-wise predictions can be unstable in real-world scenarios. It introduces video-wise aggregation and novel bias-variance metrics, defined as $B(\,\cdot\,) = \frac{1}{N} \sum_{i=1}^N (Y_i - \hat{Y}_i)^2$ and $V(P_i) = \frac{1}{N} \sum_{i=1}^N \sigma^2(P_i)$ with $\sigma^2(P_i) = \frac{1}{M_i} \sum_{j=1}^{M_i} (P_{ij}-\bar{P}_i)^2$, to quantify temporal robustness. The authors propose ECLIPS, an ensemble framework comprising a CLIP Visual Encoder-based base learner and a learnable decision fusion module, trained with Monte Carlo dropout to capture uncertainty and improve generalization across datasets such as OCIM, CelebA-Spoof, and SiW-Mv2. Key contributions include the introduction of bias-variance robustness metrics for FAS, demonstration that backbone scaling is insufficient for generalization, and state-of-the-art performance (HTER and AUC) through ensemble design on multiple cross-domain benchmarks. The work advances practical video FAS by enabling uncertainty-aware training and robust, scalable deployment with smaller backbones.
Abstract
This paper presents a novel perspective for enhancing anti-spoofing performance in zero-shot data domain generalization. Unlike traditional image classification tasks, face anti-spoofing datasets display unique generalization characteristics, necessitating novel zero-shot data domain generalization. One step forward to the previous frame-wise spoofing prediction, we introduce a nuanced metric calculation that aggregates frame-level probabilities for a video-wise prediction, to tackle the gap between the reported frame-wise accuracy and instability in real-world use-case. This approach enables the quantification of bias and variance in model predictions, offering a more refined analysis of model generalization. Our investigation reveals that simply scaling up the backbone of models does not inherently improve the mentioned instability, leading us to propose an ensembled backbone method from a Bayesian perspective. The probabilistically ensembled backbone both improves model robustness measured from the proposed metric and spoofing accuracy, and also leverages the advantages of measuring uncertainty, allowing for enhanced sampling during training that contributes to model generalization across new datasets. We evaluate the proposed method from the benchmark OMIC dataset and also the public CelebA-Spoof and SiW-Mv2. Our final model outperforms existing state-of-the-art methods across the datasets, showcasing advancements in Bias, Variance, HTER, and AUC metrics.
