M$^2$CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation
Ziyuan Liu, Jiawei Zhang, Wenyu Wang, Yuantao Gu
TL;DR
This paper addresses optical-SAR change detection by bridging the modality gap that hampers traditional Siamese backbones. It introduces M^2CD, a unified framework that combines modality-specific Mixture of Experts (MoE) with an Optical-to-SAR path (O2SP) and self-distillation to align features across modalities, compatible with both CNN and Transformer backbones. Key contributions include the MoE module for modality-aware representation learning and the O2SP-based training guidance that reduces cross-modal discrepancy without adding inference cost. Extensive experiments on the CAU-Flood data demonstrate state-of-the-art performance for the MiT-b1 variant, with ablations validating the effectiveness of both MoE and O2SP. The approach offers robust, efficient multimodal CD suitable for disaster response and other cross-modal remote sensing scenarios."
Abstract
Most existing change detection (CD) methods focus on optical images captured at different times, and deep learning (DL) has achieved remarkable success in this domain. However, in extreme scenarios such as disaster response, synthetic aperture radar (SAR), with its active imaging capability, is more suitable for providing post-event data. This introduces new challenges for CD methods, as existing weight-sharing Siamese networks struggle to effectively learn the cross-modal data distribution between optical and SAR images. To address this challenge, we propose a unified MultiModal CD framework, M$^2$CD. We integrate Mixture of Experts (MoE) modules into the backbone to explicitly handle diverse modalities, thereby enhancing the model's ability to learn multimodal data distributions. Additionally, we innovatively propose an Optical-to-SAR guided path (O2SP) and implement self-distillation during training to reduce the feature space discrepancy between different modalities, further alleviating the model's learning burden. We design multiple variants of M$^2$CD based on both CNN and Transformer backbones. Extensive experiments validate the effectiveness of the proposed framework, with the MiT-b1 version of M$^2$CD outperforming all state-of-the-art (SOTA) methods in optical-SAR CD tasks.
