Modality Agnostic Heterogeneous Face Recognition with Switch Style Modulators
Anjith George, Sebastien Marcel
TL;DR
The paper addresses cross-modal face recognition by removing the need to train separate models for each source-target modality pair. It introduces Switch Style Modulation Blocks (SSMB) within a Mixture-of-Experts routing framework to automatically route and modulate features, yielding modality-agnostic embeddings when integrated with a pre-trained FR backbone. A joint training scheme using cosine contrastive loss, teacher-student identity supervision, and load-balancing loss enables stable end-to-end learning without requiring explicit target modality labels at inference. Experiments on MCXFace and standard HFR benchmarks show state-of-the-art or competitive performance across multiple modalities, with efficient, single-model inference that handles diverse sensing channels. The work provides public protocols and code to facilitate broader adoption of modality-agnostic HFR in real-world, sensor-diverse environments.
Abstract
Heterogeneous Face Recognition (HFR) systems aim to enhance the capability of face recognition in challenging cross-modal authentication scenarios. However, the significant domain gap between the source and target modalities poses a considerable challenge for cross-domain matching. Existing literature primarily focuses on developing HFR approaches for specific pairs of face modalities, necessitating the explicit training of models for each source-target combination. In this work, we introduce a novel framework designed to train a modality-agnostic HFR method capable of handling multiple modalities during inference, all without explicit knowledge of the target modality labels. We achieve this by implementing a computationally efficient automatic routing mechanism called Switch Style Modulation Blocks (SSMB) that trains various domain expert modulators which transform the feature maps adaptively reducing the domain gap. Our proposed SSMB can be trained end-to-end and seamlessly integrated into pre-trained face recognition models, transforming them into modality-agnostic HFR models. We have performed extensive evaluations on HFR benchmark datasets to demonstrate its effectiveness. The source code and protocols will be made publicly available.
