Table of Contents
Fetching ...

Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification

Qinghao Gao, Jiahui Qu, Yunsong Li, Wenqian Dong

TL;DR

This work tackles multimodal remote sensing classification under modality missing. It reframes missing data as a multi-task adaptation problem by introducing MaMOL, a missing-aware Mixture-of-Loras with dynamic task-oriented routing and static modality-shared routing on top of a frozen backbone. Empirical results across Houston2013, Augsburg, and Trento demonstrate superior robustness to missing modalities and favorable transfer to natural images, with ablations confirming the benefits of dynamic, static, and modality-specific experts. The method offers a lightweight, scalable approach that generalizes to additional modalities and cross-domain tasks, addressing real-world incompleteness in multimodal sensing.

Abstract

Multimodal classification in remote sensing often suffers from missing modalities caused by environmental interference, sensor failures, or atmospheric effects, which severely degrade classification performance. Existing two-stage adaptation methods are computationally expensive and assume complete multimodal data during training, limiting their generalization to real-world incompleteness. To overcome these issues, we propose a Missing-aware Mixture-of-Loras (MaMOL) framework that reformulates modality missing as a multi-task learning problem. MaMOL introduces a dual-routing mechanism: a task-oriented dynamic router that adaptively activates experts for different missing patterns, and a modality-specific-shared static router that maintains stable cross-modal knowledge sharing. Unlike prior methods that train separate networks for each missing configuration, MaMOL achieves parameter-efficient adaptation via lightweight expert updates and shared expert reuse. Experiments on multiple remote sensing benchmarks demonstrate superior robustness and generalization under varying missing rates, with minimal computational overhead. Moreover, transfer experiments on natural image datasets validate its scalability and cross-domain applicability, highlighting MaMOL as a general and efficient solution for incomplete multimodal learning.

Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification

TL;DR

This work tackles multimodal remote sensing classification under modality missing. It reframes missing data as a multi-task adaptation problem by introducing MaMOL, a missing-aware Mixture-of-Loras with dynamic task-oriented routing and static modality-shared routing on top of a frozen backbone. Empirical results across Houston2013, Augsburg, and Trento demonstrate superior robustness to missing modalities and favorable transfer to natural images, with ablations confirming the benefits of dynamic, static, and modality-specific experts. The method offers a lightweight, scalable approach that generalizes to additional modalities and cross-domain tasks, addressing real-world incompleteness in multimodal sensing.

Abstract

Multimodal classification in remote sensing often suffers from missing modalities caused by environmental interference, sensor failures, or atmospheric effects, which severely degrade classification performance. Existing two-stage adaptation methods are computationally expensive and assume complete multimodal data during training, limiting their generalization to real-world incompleteness. To overcome these issues, we propose a Missing-aware Mixture-of-Loras (MaMOL) framework that reformulates modality missing as a multi-task learning problem. MaMOL introduces a dual-routing mechanism: a task-oriented dynamic router that adaptively activates experts for different missing patterns, and a modality-specific-shared static router that maintains stable cross-modal knowledge sharing. Unlike prior methods that train separate networks for each missing configuration, MaMOL achieves parameter-efficient adaptation via lightweight expert updates and shared expert reuse. Experiments on multiple remote sensing benchmarks demonstrate superior robustness and generalization under varying missing rates, with minimal computational overhead. Moreover, transfer experiments on natural image datasets validate its scalability and cross-domain applicability, highlighting MaMOL as a general and efficient solution for incomplete multimodal learning.

Paper Structure

This paper contains 28 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison between (a) previous two-stage adaptation methods and (b) our proposed unified framework. (a) Previous methods: A pretrained multimodal net requires separate adaptation phases (#1, #2, …) for each missing-modality case, which leads to multiple specialized networks and high computational overhead. (b) Proposed method: A single Missingmodal Net is trained with generalized inputs that already include diverse missing-modality cases, enabling efficient and unified modeling without additional adaptation.
  • Figure 2: Overall architecture of the proposed framework. Multimodal data with potential missing modalities are embedded into modality-specific representations. These embeddings are then fed into a Transformer backbone enhanced with a Missing-aware Mixture-of-Loras (MaMOL) module. The design includes shared experts, pattern experts, and modality-specialized experts, enabling both general knowledge transfer and modality-aware adaptation. Finally, the aggregated features are concatenated and passed into the classification head.
  • Figure 3: Comparison of different Mixture-of-Experts (MoE) architectures. (a) Adapt-Based MoE architecture. (b) Task-Driven MoE architecture. (c) Missing-aware Mixture-of-Loras (MaMOL) architecture.
  • Figure 4: Ablations on the generalizability of MaMOL across different testing scenarios under various missing rates on the Houston2013 dataset. (a) Models trained and tested on missing-both cases with different missing rates. (b) Models trained on missing-both or missing-image cases and evaluated on missing-image conditions. (c) Models trained on missing-both or missing-text cases and evaluated on missing-text conditions.