Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification
Qinghao Gao, Jiahui Qu, Yunsong Li, Wenqian Dong
TL;DR
This work tackles multimodal remote sensing classification under modality missing. It reframes missing data as a multi-task adaptation problem by introducing MaMOL, a missing-aware Mixture-of-Loras with dynamic task-oriented routing and static modality-shared routing on top of a frozen backbone. Empirical results across Houston2013, Augsburg, and Trento demonstrate superior robustness to missing modalities and favorable transfer to natural images, with ablations confirming the benefits of dynamic, static, and modality-specific experts. The method offers a lightweight, scalable approach that generalizes to additional modalities and cross-domain tasks, addressing real-world incompleteness in multimodal sensing.
Abstract
Multimodal classification in remote sensing often suffers from missing modalities caused by environmental interference, sensor failures, or atmospheric effects, which severely degrade classification performance. Existing two-stage adaptation methods are computationally expensive and assume complete multimodal data during training, limiting their generalization to real-world incompleteness. To overcome these issues, we propose a Missing-aware Mixture-of-Loras (MaMOL) framework that reformulates modality missing as a multi-task learning problem. MaMOL introduces a dual-routing mechanism: a task-oriented dynamic router that adaptively activates experts for different missing patterns, and a modality-specific-shared static router that maintains stable cross-modal knowledge sharing. Unlike prior methods that train separate networks for each missing configuration, MaMOL achieves parameter-efficient adaptation via lightweight expert updates and shared expert reuse. Experiments on multiple remote sensing benchmarks demonstrate superior robustness and generalization under varying missing rates, with minimal computational overhead. Moreover, transfer experiments on natural image datasets validate its scalability and cross-domain applicability, highlighting MaMOL as a general and efficient solution for incomplete multimodal learning.
