Table of Contents
Fetching ...

PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning

Yingjie Ma, Xun Lin, Yong Xu, Weicheng Xie, Zitong Yu

TL;DR

PA-FAS tackles multimodal face anti-spoofing under domain shifts and limited annotations by decoupling reasoning from raw visual cues. It introduces Reasoning Path Augmentation with Positive–Negative Random Path Sampling to expand multimodal chain-of-thought space and an Answer Shuffling mechanism during SFT to prevent shortcut learning, followed by Group Relative Policy Optimization for RL. The approach achieves strong cross-dataset generalization and data efficiency, delivering state-of-the-art results with approximately 800 labeled samples and around 4×10^4 augmented samples. This work advances interpretable fusion and generalization in multimodal FAS, informing trustworthy deployment in real-world security systems.

Abstract

Face anti-spoofing (FAS) has recently advanced in multimodal fusion, cross-domain generalization, and interpretability. With large language models and reinforcement learning (RL), strategy-based training offers new opportunities to jointly model these aspects. However, multimodal reasoning is more complex than unimodal reasoning, requiring accurate feature representation and cross-modal verification while facing scarce, high-quality annotations, which makes direct application of RL sub-optimal. We identify two key limitations of supervised fine-tuning plus RL (SFT+RL) for multimodal FAS: (1) limited multimodal reasoning paths restrict the use of complementary modalities and shrink the exploration space after SFT, weakening the effect of RL; and (2) mismatched single-task supervision versus diverse reasoning paths causes reasoning confusion, where models may exploit shortcuts by mapping images directly to answers and ignoring the intended reasoning. To address this, we propose PA-FAS, which enhances reasoning paths by constructing high-quality extended reasoning sequences from limited annotations, enriching paths and relaxing exploration constraints. We further introduce an answer-shuffling mechanism during SFT to force comprehensive multimodal analysis instead of using superficial cues, thereby encouraging deeper reasoning and mitigating shortcut learning. PA-FAS significantly improves multimodal reasoning accuracy and cross-domain generalization, and better unifies multimodal fusion, generalization, and interpretability for trustworthy FAS.

PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning

TL;DR

PA-FAS tackles multimodal face anti-spoofing under domain shifts and limited annotations by decoupling reasoning from raw visual cues. It introduces Reasoning Path Augmentation with Positive–Negative Random Path Sampling to expand multimodal chain-of-thought space and an Answer Shuffling mechanism during SFT to prevent shortcut learning, followed by Group Relative Policy Optimization for RL. The approach achieves strong cross-dataset generalization and data efficiency, delivering state-of-the-art results with approximately 800 labeled samples and around 4×10^4 augmented samples. This work advances interpretable fusion and generalization in multimodal FAS, informing trustworthy deployment in real-world security systems.

Abstract

Face anti-spoofing (FAS) has recently advanced in multimodal fusion, cross-domain generalization, and interpretability. With large language models and reinforcement learning (RL), strategy-based training offers new opportunities to jointly model these aspects. However, multimodal reasoning is more complex than unimodal reasoning, requiring accurate feature representation and cross-modal verification while facing scarce, high-quality annotations, which makes direct application of RL sub-optimal. We identify two key limitations of supervised fine-tuning plus RL (SFT+RL) for multimodal FAS: (1) limited multimodal reasoning paths restrict the use of complementary modalities and shrink the exploration space after SFT, weakening the effect of RL; and (2) mismatched single-task supervision versus diverse reasoning paths causes reasoning confusion, where models may exploit shortcuts by mapping images directly to answers and ignoring the intended reasoning. To address this, we propose PA-FAS, which enhances reasoning paths by constructing high-quality extended reasoning sequences from limited annotations, enriching paths and relaxing exploration constraints. We further introduce an answer-shuffling mechanism during SFT to force comprehensive multimodal analysis instead of using superficial cues, thereby encouraging deeper reasoning and mitigating shortcut learning. PA-FAS significantly improves multimodal reasoning accuracy and cross-domain generalization, and better unifies multimodal fusion, generalization, and interpretability for trustworthy FAS.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Accuracy of SFT and SFT+RL methods on different augmented datasets. With a fixed data size of 800, datasets with a single reasoning path fail to achieve higher accuracy in the subsequent RL stage after SFT training, and may even experience a decline in performance. In contrast, datasets with diverse reasoning paths demonstrate significantly better performance, achieving higher accuracy under both SFT and SFT+RL methods.
  • Figure 2: Cumulative effective sample size versus training steps in the RL and SFT+RL stages for models trained with 800 data.
  • Figure 3: Schematic diagram of the PA-FAS framework. Raw data undergo (a) low-level and (b) high-level data annotation to obtain corresponding CoT. Subsequently, (c) Positive–Negative Random Path Sampling is employed to sample a specified number of reasoning paths from a human-constructed multimodal reasoning tree and integrate them into CoT. During the (d) SFT+RL training paradigm, answers are randomly shuffled in the SFT stage to prevent the policy model from forming shortcuts, thereby learning diverse reasoning paths and rich multimodal domain-specific knowledge. In the RL stage, the policy model achieves generalization through classification and format rewards.
  • Figure 4: Sunburst diagram of the fine-grained hierarchical taxonomy for FAS. Every category is directly mapped to a node in the reasoning tree.
  • Figure 5: A diagram of AUC and Perplexity vs. reasoning path sampling number $N$. As sampling paths increase, perplexity rises sharply, causing AUC to peak and then decline slowly.