Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models
Colton R. Crum, Samuel Webster, Adam Czajka
TL;DR
This work tackles generalization limits in biometric PAD and synthetic-face detection by studying how the granularity of human saliency information, and its source, affect model training. It systematically evaluates three granularity levels—BOI, AOI, FOI—and four saliency sources (human, models mimicking humans, domain segmentation models, and none) across iris-PAD and synthetic-face tasks using CYBORG loss. Key findings show that AOI is often the optimal granularity for iris-PAD, while synthetic-face results depend on architecture; notably, models trained to mimic human saliency frequently outperform direct human annotations, enabling scalable saliency generation. The results point to practical, lower-cost strategies that still yield strong generalization, and highlight that segmentation-model saliency cannot fully replace human-derived guidance in these biometric tasks.
Abstract
Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.
