Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

Riling Wei; Kelu Yao; Chuanguang Yang; Jin Wang; Zhuoyan Gao; Chao Li

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

Riling Wei, Kelu Yao, Chuanguang Yang, Jin Wang, Zhuoyan Gao, Chao Li

TL;DR

This work introduces Asymmetric Cross-modal Knowledge Distillation (ACKD) to enable knowledge transfer between modalities with weak semantic overlap, addressing limitations of traditional Symmetric Cross-modal KD in scenarios with unpaired data. The authors propose SemBridge, a plug-in framework comprising Student-Friendly Matching (SFM) to reduce transport costs via dynamic teacher selection, and Semantic-aware Knowledge Alignment (SKA) to optimize intra- and cross-modal transport with an optimal transport planner and CORAL-based alignment. They formalize transport costs with Wasserstein distance and intra-modal planning, implement retrieval galleries and a self-supervised semantic matcher, and demonstrate that SemBridge achieves state-of-the-art performance across six model architectures on three remote-sensing datasets, while also improving existing SCKD methods under ACKD. The dataset benchmark and method show strong practical impact for scalable, cross-modal remote sensing applications, though training speed remains a noted limitation for the SFM component.

Abstract

Cross-modal Knowledge Distillation has demonstrated promising performance on paired modalities with strong semantic connections, referred to as Symmetric Cross-modal Knowledge Distillation (SCKD). However, implementing SCKD becomes exceedingly constrained in real-world scenarios due to the limited availability of paired modalities. To this end, we investigate a general and effective knowledge learning concept under weak semantic consistency, dubbed Asymmetric Cross-modal Knowledge Distillation (ACKD), aiming to bridge modalities with limited semantic overlap. Nevertheless, the shift from strong to weak semantic consistency improves flexibility but exacerbates challenges in knowledge transmission costs, which we rigorously verified based on optimal transport theory. To mitigate the issue, we further propose a framework, namely SemBridge, integrating a Student-Friendly Matching module and a Semantic-aware Knowledge Alignment module. The former leverages self-supervised learning to acquire semantic-based knowledge and provide personalized instruction for each student sample by dynamically selecting the relevant teacher samples. The latter seeks the optimal transport path by employing Lagrangian optimization. To facilitate the research, we curate a benchmark dataset derived from two modalities, namely Multi-Spectral (MS) and asymmetric RGB images, tailored for remote sensing scene classification. Comprehensive experiments exhibit that our framework achieves state-of-the-art performance compared with 7 existing approaches on 6 different model architectures across various datasets.

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

TL;DR

Abstract

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)