Cross-Spectral Attention for Unsupervised RGB-IR Face Verification and Person Re-identification
Kshitij Nikhal, Cedric Nimpa Fondje, Benjamin S. Riggan
TL;DR
This work tackles unsupervised cross-spectral matching between RGB and IR for face verification and person ReID. It introduces a threefold framework combining a Cross-Spectral Attention Network (CSAN), a Pseudo Triplet Loss with Offline Cross-Spectral Voting (PTL), and Pixel-Channel Sparsity (PCS) to learn domain-invariant, discriminative representations without labeled data. By leveraging intra-domain agglomerative clustering, cross-spectral voting, and sparsity regularization, the method achieves competitive or superior results on RegDB and ARL-VTF, often surpassing some supervised baselines in unsupervised settings. The approach offers a practical, scalable pathway for cross-spectral biometric tasks and demonstrates strong potential for generalization to related unsupervised cross-domain recognition problems.
Abstract
Cross-spectral biometrics, such as matching imagery of faces or persons from visible (RGB) and infrared (IR) bands, have rapidly advanced over the last decade due to increasing sensitivity, size, quality, and ubiquity of IR focal plane arrays and enhanced analytics beyond the visible spectrum. Current techniques for mitigating large spectral disparities between RGB and IR imagery often include learning a discriminative common subspace by exploiting precisely curated data acquired from multiple spectra. Although there are challenges with determining robust architectures for extracting common information, a critical limitation for supervised methods is poor scalability in terms of acquiring labeled data. Therefore, we propose a novel unsupervised cross-spectral framework that combines (1) a new pseudo triplet loss with cross-spectral voting, (2) a new cross-spectral attention network leveraging multiple subspaces, and (3) structured sparsity to perform more discriminative cross-spectral clustering. We extensively compare our proposed RGB-IR biometric learning framework (and its individual components) with recent and previous state-of-the-art models on two challenging benchmark datasets: DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF) and RegDB person re-identification dataset, and, in some cases, achieve performance superior to completely supervised methods.
