Table of Contents
Fetching ...

Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data

Jinjing Zhu, Yucheng Chen, Lin Wang

TL;DR

This work tackles source-free cross-modal knowledge transfer where source data are unavailable but a pretrained source model and paired task-irrelevant data are accessible. It introduces two synergistic modules: TGMB, which translates target modality data into source-like RGB images guided by paired TI data and the source model, and TGKT, which transfers knowledge from the source model to the target model using KL-divergence, TI-guided feature alignment, and self-supervised pseudo-labels. The approach leverages mutual information guidance and TI data to better estimate the source distribution and to facilitate effective cross-modal transfer for unlabeled TR target data, achieving state-of-the-art results on SUN RGB-D, DIML RGB-D, and RGB-NIR. The framework offers a practical solution for privacy-preserving, cross-modal learning with limited access to source-domain data and unlabeled targets, and it highlights the potential of TI data to empower robust knowledge transfer across modalities.

Abstract

Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR) source data due to memory and privacy concerns. A recent attempt leverages the paired task-irrelevant (TI) data and directly matches the features from them to eliminate the modality gap. However, it ignores a pivotal clue that the paired TI data could be utilized to effectively estimate the source data distribution and better facilitate knowledge transfer to the target modality. To this end, we propose a novel yet concise framework to unlock the potential of paired TI data for enhancing source-free cross-modal knowledge transfer. Our work is buttressed by two key technical components. Firstly, to better estimate the source data distribution, we introduce a Task-irrelevant data-Guided Modality Bridging (TGMB) module. It translates the target modality data (e.g., infrared) into the source-like RGB images based on paired TI data and the guidance of the available source model to alleviate two key gaps: 1) inter-modality gap between the paired TI data; 2) intra-modality gap between TI and TR target data. We then propose a Task-irrelevant data-Guided Knowledge Transfer (TGKT) module that transfers knowledge from the source model to the target model by leveraging the paired TI data. Notably, due to the unavailability of labels for the TR target data and its less reliable prediction from the source model, our TGKT model incorporates a self-supervised pseudo-labeling approach to enable the target model to learn from its predictions. Extensive experiments show that our method achieves state-of-the-art performance on three datasets (RGB-to-depth and RGB-to-infrared).

Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data

TL;DR

This work tackles source-free cross-modal knowledge transfer where source data are unavailable but a pretrained source model and paired task-irrelevant data are accessible. It introduces two synergistic modules: TGMB, which translates target modality data into source-like RGB images guided by paired TI data and the source model, and TGKT, which transfers knowledge from the source model to the target model using KL-divergence, TI-guided feature alignment, and self-supervised pseudo-labels. The approach leverages mutual information guidance and TI data to better estimate the source distribution and to facilitate effective cross-modal transfer for unlabeled TR target data, achieving state-of-the-art results on SUN RGB-D, DIML RGB-D, and RGB-NIR. The framework offers a practical solution for privacy-preserving, cross-modal learning with limited access to source-domain data and unlabeled targets, and it highlights the potential of TI data to empower robust knowledge transfer across modalities.

Abstract

Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR) source data due to memory and privacy concerns. A recent attempt leverages the paired task-irrelevant (TI) data and directly matches the features from them to eliminate the modality gap. However, it ignores a pivotal clue that the paired TI data could be utilized to effectively estimate the source data distribution and better facilitate knowledge transfer to the target modality. To this end, we propose a novel yet concise framework to unlock the potential of paired TI data for enhancing source-free cross-modal knowledge transfer. Our work is buttressed by two key technical components. Firstly, to better estimate the source data distribution, we introduce a Task-irrelevant data-Guided Modality Bridging (TGMB) module. It translates the target modality data (e.g., infrared) into the source-like RGB images based on paired TI data and the guidance of the available source model to alleviate two key gaps: 1) inter-modality gap between the paired TI data; 2) intra-modality gap between TI and TR target data. We then propose a Task-irrelevant data-Guided Knowledge Transfer (TGKT) module that transfers knowledge from the source model to the target model by leveraging the paired TI data. Notably, due to the unavailability of labels for the TR target data and its less reliable prediction from the source model, our TGKT model incorporates a self-supervised pseudo-labeling approach to enable the target model to learn from its predictions. Extensive experiments show that our method achieves state-of-the-art performance on three datasets (RGB-to-depth and RGB-to-infrared).
Paper Structure (29 sections, 11 equations, 4 figures, 9 tables, 2 algorithms)

This paper contains 29 sections, 11 equations, 4 figures, 9 tables, 2 algorithms.

Figures (4)

  • Figure 1: Our observations. Paired TI data facilitates translating TR target data into TR source-like RGB images and transferring knowledge for the task of interest.
  • Figure 2: Overall framework of our proposed method. TGMB: Task-irrelevant data-Guided Modality Bridging, TGKT: Task-irrelevant data Guided Knowledge Transfer.
  • Figure 3: Reconstructed TR images via VQ-GAN on the K-v1 $\to$ K-v1 transfer task.
  • Figure 4: Generated TR source-like images via TGMB on the K-v1 $\to$ K-v1 transfer task.