Table of Contents
Fetching ...

From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition

Anjith George, Sebastien Marcel

TL;DR

This work frames cross-modality face recognition as a style-variation problem and introduces Conditional Adaptive Instance Modulation (CAIM) to adapt target-modality features within a frozen pre-trained FR backbone. By inserting CAIM blocks between early layers and gating them to the target modality, the approach aligns embeddings across modalities via contrastive learning, without retraining the entire FR model. Extensive experiments across six datasets (including VIS-Thermal, VIS-NIR, and sketch-to-photo tasks) show state-of-the-art performance in five of six benchmarks, with modest computational overhead. The method is architecture-agnostic with respect to the FR backbone and is accompanied by public code and reproducible protocols to foster adoption and extension.

Abstract

Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we view different modalities as distinct styles and propose a method to modulate feature maps of the target modality to address the domain gap. We present a new Conditional Adaptive Instance Modulation (CAIM ) module that seamlessly fits into existing FR networks, turning them into HFR-ready systems. The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap. Our method enables end-to-end training using a small set of paired samples. We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available

From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition

TL;DR

This work frames cross-modality face recognition as a style-variation problem and introduces Conditional Adaptive Instance Modulation (CAIM) to adapt target-modality features within a frozen pre-trained FR backbone. By inserting CAIM blocks between early layers and gating them to the target modality, the approach aligns embeddings across modalities via contrastive learning, without retraining the entire FR model. Extensive experiments across six datasets (including VIS-Thermal, VIS-NIR, and sketch-to-photo tasks) show state-of-the-art performance in five of six benchmarks, with modest computational overhead. The method is architecture-agnostic with respect to the FR backbone and is accompanied by public code and reproducible protocols to foster adoption and extension.

Abstract

Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we view different modalities as distinct styles and propose a method to modulate feature maps of the target modality to address the domain gap. We present a new Conditional Adaptive Instance Modulation (CAIM ) module that seamlessly fits into existing FR networks, turning them into HFR-ready systems. The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap. Our method enables end-to-end training using a small set of paired samples. We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available
Paper Structure (32 sections, 9 equations, 4 figures, 12 tables)

This paper contains 32 sections, 9 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: This figure shows the facial images of the same individual acquired using distinct imaging modalities (Images taken from MCXFace dataset george2022prepended). The task in HFR is to facilitate cross-domain matching while overcoming the challenges posed by the domain gap.
  • Figure 2: Schematic diagram of the proposed framework: Layer 1 to Layer N represent the frozen blocks of layers from a pretrained Face Recognition (FR) model. The CAIM module is inserted between the initial few blocks.
  • Figure 3: Architecture of the Conditional Adaptive Instance Modulation (CAIM ) block. The global gate signal activates the block. The gate signal becoming zero deactivates this module and the entire module functions as an identity block in this case due to the residual path.
  • Figure 4: Sample images from source and target modalities from six different HFR datasets. Images are from MCXFace george2022prepended, Tufts Face panetta2018comprehensive, SCFace grgic2011scface, Polathermal hu2016polarimetric CASIA NIR-VIS 2.0 li2013casia, CUHK Face Sketch FERET Database (CUFSF) zhang2011coupled respectively.