Table of Contents
Fetching ...

Multi-Domain Biometric Recognition using Body Embeddings

Anirudh Nanduri, Siyuan Huang, Rama Chellappa

TL;DR

The paper tackles cross-spectral biometric recognition across VIS and infrared bands (SWIR, MWIR, LWIR) by focusing on body embeddings. It proposes a Cross-Spectral Semantic Body Identification framework based on a vision transformer that fuses global and local semantic features to achieve robust template-level matching across spectra, trained with a joint identity and triplet loss. Empirical results show body embeddings outperform face embeddings in MWIR/LWIR, and a VIS-pretrained ViT with simple finetuning attains state-of-the-art mAP on LLCM, while domain-aware sampling and local features enhance cross-domain consistency. The work demonstrates that reusing VIS-trained models with careful architectural design can effectively bridge large modality gaps in multi-domain infrared recognition, offering practical benefits for low-light surveillance and related applications.

Abstract

Biometric recognition becomes increasingly challenging as we move away from the visible spectrum to infrared imagery, where domain discrepancies significantly impact identification performance. In this paper, we show that body embeddings perform better than face embeddings for cross-spectral person identification in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains. Due to the lack of multi-domain datasets, previous research on cross-spectral body identification - also known as Visible-Infrared Person Re-Identification (VI-ReID) - has primarily focused on individual infrared bands, such as near-infrared (NIR) or LWIR, separately. We address the multi-domain body recognition problem using the IARPA Janus Benchmark Multi-Domain Face (IJB-MDF) dataset, which enables matching of short-wave infrared (SWIR), MWIR, and LWIR images against RGB (VIS) images. We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset and, through extensive experiments, provide valuable insights into the interrelation of infrared domains, the adaptability of VIS-pretrained models, the role of local semantic features in body-embeddings, and effective training strategies for small datasets. Additionally, we show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores on the LLCM dataset.

Multi-Domain Biometric Recognition using Body Embeddings

TL;DR

The paper tackles cross-spectral biometric recognition across VIS and infrared bands (SWIR, MWIR, LWIR) by focusing on body embeddings. It proposes a Cross-Spectral Semantic Body Identification framework based on a vision transformer that fuses global and local semantic features to achieve robust template-level matching across spectra, trained with a joint identity and triplet loss. Empirical results show body embeddings outperform face embeddings in MWIR/LWIR, and a VIS-pretrained ViT with simple finetuning attains state-of-the-art mAP on LLCM, while domain-aware sampling and local features enhance cross-domain consistency. The work demonstrates that reusing VIS-trained models with careful architectural design can effectively bridge large modality gaps in multi-domain infrared recognition, offering practical benefits for low-light surveillance and related applications.

Abstract

Biometric recognition becomes increasingly challenging as we move away from the visible spectrum to infrared imagery, where domain discrepancies significantly impact identification performance. In this paper, we show that body embeddings perform better than face embeddings for cross-spectral person identification in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains. Due to the lack of multi-domain datasets, previous research on cross-spectral body identification - also known as Visible-Infrared Person Re-Identification (VI-ReID) - has primarily focused on individual infrared bands, such as near-infrared (NIR) or LWIR, separately. We address the multi-domain body recognition problem using the IARPA Janus Benchmark Multi-Domain Face (IJB-MDF) dataset, which enables matching of short-wave infrared (SWIR), MWIR, and LWIR images against RGB (VIS) images. We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset and, through extensive experiments, provide valuable insights into the interrelation of infrared domains, the adaptability of VIS-pretrained models, the role of local semantic features in body-embeddings, and effective training strategies for small datasets. Additionally, we show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores on the LLCM dataset.

Paper Structure

This paper contains 14 sections, 9 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Sample face images from the IJB-MDF dataset from VIS, SWIR, MWIR and LWIR domains along with their cosine similarity score heatmap. Features are extracted using a VIS-pretrained face model.
  • Figure 2: Sample body images from the IJB-MDF dataset from VIS, SWIR, MWIR and LWIR domains along with their cosine similarity score heatmap. Features are extracted using a VIS-pretrained body model.
  • Figure 3: Cross-Spectral Body Identification Pipeline - the VIS and IR inputs are passed through a ViT architecture to generate global and local semantic features, which are fused to generate the final feature. All the fused features corresponding to a single template are then used to generate the template feature. The template feature of the query template is then matched against all the gallery templates to estimate the subject id.