Table of Contents
Fetching ...

Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models

Colton R. Crum, Samuel Webster, Adam Czajka

TL;DR

This work tackles generalization limits in biometric PAD and synthetic-face detection by studying how the granularity of human saliency information, and its source, affect model training. It systematically evaluates three granularity levels—BOI, AOI, FOI—and four saliency sources (human, models mimicking humans, domain segmentation models, and none) across iris-PAD and synthetic-face tasks using CYBORG loss. Key findings show that AOI is often the optimal granularity for iris-PAD, while synthetic-face results depend on architecture; notably, models trained to mimic human saliency frequently outperform direct human annotations, enabling scalable saliency generation. The results point to practical, lower-cost strategies that still yield strong generalization, and highlight that segmentation-model saliency cannot fully replace human-derived guidance in these biometric tasks.

Abstract

Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.

Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models

TL;DR

This work tackles generalization limits in biometric PAD and synthetic-face detection by studying how the granularity of human saliency information, and its source, affect model training. It systematically evaluates three granularity levels—BOI, AOI, FOI—and four saliency sources (human, models mimicking humans, domain segmentation models, and none) across iris-PAD and synthetic-face tasks using CYBORG loss. Key findings show that AOI is often the optimal granularity for iris-PAD, while synthetic-face results depend on architecture; notably, models trained to mimic human saliency frequently outperform direct human annotations, enabling scalable saliency generation. The results point to practical, lower-cost strategies that still yield strong generalization, and highlight that segmentation-model saliency cannot fully replace human-derived guidance in these biometric tasks.

Abstract

Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.
Paper Structure (17 sections, 3 figures, 2 tables)

This paper contains 17 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Examples of salience granularity used in saliency-based training defined within this paper: Boundary of Interest (BOI), Area of Interest (AOI), Features of Interest (FOI), sourced from either human subjects or models that were trained to mimic the human subjects. "Seg" indicates segmentation masks sourced from domain-specific segmentation models; (i) iris presentation attack detection task and (ii) synthetic face detection task.
  • Figure 2: Mean ROC curves and bands representing standard deviations (along the True Positive Rate axis) for all backbones used in saliency-based training with varied configurations of saliency for iris-PAD (top row) and synthetic face detection (bottom row) tasks. For human subjects and models mimicking human subjects, the optimal granularity (BBOI, AOI, FOI) is selected, indicating that generalization performance improves having human subjects within the saliency generation pipeline.
  • Figure 3: Same as in Fig. \ref{['fig:results-AUC-iris']}, except for synthetic face detection.