Table of Contents
Fetching ...

A Multi-Modal Approach for Face Anti-Spoofing in Non-Calibrated Systems using Disparity Maps

Ariel Larey, Eyal Rond, Omer Achrack

TL;DR

This work tackles face anti-spoofing in non-calibrated, edge-enabled systems by deriving disparity-based proxy-depth maps from facial landmarks and fusing them with two infrared sensor modalities. It introduces a Disparity Model based on a MobileNetV2 backbone trained with evidential loss to produce calibrated probabilities from a 10-channel input (two sensor channels plus two disparity maps). The method achieves state-of-the-art results on RealSense ID data, with an overall $EER$ of $1.71\%$ for 2D attacks and $2.04\%$ for a broader ensemble including 3D attacks, demonstrating robust edge-compatible anti-spoofing without depth calibration. This approach enables scalable, privacy-conscious deployment in distributed devices by leveraging proxy-depth information derived from facial geometry rather than expensive depth sensors.

Abstract

Face recognition technologies are increasingly used in various applications, yet they are vulnerable to face spoofing attacks. These spoofing attacks often involve unique 3D structures, such as printed papers or mobile device screens. Although stereo-depth cameras can detect such attacks effectively, their high-cost limits their widespread adoption. Conversely, two-sensor systems without extrinsic calibration offer a cost-effective alternative but are unable to calculate depth using stereo techniques. In this work, we propose a method to overcome this challenge by leveraging facial attributes to derive disparity information and estimate relative depth for anti-spoofing purposes, using non-calibrated systems. We introduce a multi-modal anti-spoofing model, coined Disparity Model, that incorporates created disparity maps as a third modality alongside the two original sensor modalities. We demonstrate the effectiveness of the Disparity Model in countering various spoof attacks using a comprehensive dataset collected from the Intel RealSense ID Solution F455. Our method outperformed existing methods in the literature, achieving an Equal Error Rate (EER) of 1.71% and a False Negative Rate (FNR) of 2.77% at a False Positive Rate (FPR) of 1%. These errors are lower by 2.45% and 7.94% than the errors of the best comparison method, respectively. Additionally, we introduce a model ensemble that addresses 3D spoof attacks as well, achieving an EER of 2.04% and an FNR of 3.83% at an FPR of 1%. Overall, our work provides a state-of-the-art solution for the challenging task of anti-spoofing in non-calibrated systems that lack depth information.

A Multi-Modal Approach for Face Anti-Spoofing in Non-Calibrated Systems using Disparity Maps

TL;DR

This work tackles face anti-spoofing in non-calibrated, edge-enabled systems by deriving disparity-based proxy-depth maps from facial landmarks and fusing them with two infrared sensor modalities. It introduces a Disparity Model based on a MobileNetV2 backbone trained with evidential loss to produce calibrated probabilities from a 10-channel input (two sensor channels plus two disparity maps). The method achieves state-of-the-art results on RealSense ID data, with an overall of for 2D attacks and for a broader ensemble including 3D attacks, demonstrating robust edge-compatible anti-spoofing without depth calibration. This approach enables scalable, privacy-conscious deployment in distributed devices by leveraging proxy-depth information derived from facial geometry rather than expensive depth sensors.

Abstract

Face recognition technologies are increasingly used in various applications, yet they are vulnerable to face spoofing attacks. These spoofing attacks often involve unique 3D structures, such as printed papers or mobile device screens. Although stereo-depth cameras can detect such attacks effectively, their high-cost limits their widespread adoption. Conversely, two-sensor systems without extrinsic calibration offer a cost-effective alternative but are unable to calculate depth using stereo techniques. In this work, we propose a method to overcome this challenge by leveraging facial attributes to derive disparity information and estimate relative depth for anti-spoofing purposes, using non-calibrated systems. We introduce a multi-modal anti-spoofing model, coined Disparity Model, that incorporates created disparity maps as a third modality alongside the two original sensor modalities. We demonstrate the effectiveness of the Disparity Model in countering various spoof attacks using a comprehensive dataset collected from the Intel RealSense ID Solution F455. Our method outperformed existing methods in the literature, achieving an Equal Error Rate (EER) of 1.71% and a False Negative Rate (FNR) of 2.77% at a False Positive Rate (FPR) of 1%. These errors are lower by 2.45% and 7.94% than the errors of the best comparison method, respectively. Additionally, we introduce a model ensemble that addresses 3D spoof attacks as well, achieving an EER of 2.04% and an FNR of 3.83% at an FPR of 1%. Overall, our work provides a state-of-the-art solution for the challenging task of anti-spoofing in non-calibrated systems that lack depth information.

Paper Structure

This paper contains 26 sections, 6 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Demonstration of disparity maps creation pipeline for live (top) and spoof (bottom) examples. First, image acquisition is performed by the device (a). Next, Face Detection is applied (b) followed by Facial landmarks extraction and sparse disparity calculation (c). Finally, disparity maps are produced by linear interpolation (d). Green represents higher disparity than the red.
  • Figure 2: Disparity Model full pipeline. Initially, faces are detected, and facial landmarks are extracted from the data of both sensors. These landmarks serve as key points for sparse disparity calculation along both the horizontal and vertical axes. This is followed by spatial interpolation, resulting in two disparity maps. Finally, the aligned right sensor crop, the left crop, and the disparity maps are concatenated and processed by a CNN to predict whether the input is a spoof or live.
  • Figure 3: Examples for live samples in the collected data.
  • Figure 4: Examples for spoof samples in the collected data.
  • Figure 5: Examples for Disparity maps produced from live samples.
  • ...and 4 more figures