Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks
Kai Zhang, Wentao Yu, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief
TL;DR
This work tackles the challenge of robust, low-latency ISAC in low-altitude wireless networks by introducing a multimodal mixture-of-experts framework that adaptively weights modality-specific experts through a light gating network. A sparse variant further reduces energy and computation via top-N expert activation with straight-through gradient routing, maintaining performance while saving resources on UAVs. Across three representative ISAC tasks—sensing-aided beam prediction, sensing-aided path loss prediction, and communication-aided UAV trajectory tracking—the MoE models consistently outperform static fusion and monolithic baselines, with improved training efficiency. The results demonstrate that adaptive, modality-aware fusion is crucial for reliable perception and connectivity in dynamic aerial environments, enabling practical deployment in LAWNs.
Abstract
Integrated sensing and communication (ISAC) is a key enabler for low-altitude wireless networks (LAWNs), providing simultaneous environmental perception and data transmission in complex aerial scenarios. By combining heterogeneous sensing modalities such as visual, radar, lidar, and positional information, multimodal ISAC can improve both situational awareness and robustness of LAWNs. However, most existing multimodal fusion approaches use static fusion strategies that treat all modalities equally and cannot adapt to channel heterogeneity or time-varying modality reliability in dynamic low-altitude environments. To address this fundamental limitation, we propose a mixture-of-experts (MoE) framework for multimodal ISAC in LAWNs. Each modality is processed by a dedicated expert network, and a lightweight gating module adaptively assigns fusion weights according to the instantaneous informativeness and reliability of each modality. To improve scalability under the stringent energy constraints of aerial platforms, we further develop a sparse MoE variant that selectively activates only a subset of experts, thereby reducing computation overhead while preserving the benefits of adaptive fusion. Comprehensive simulations on three typical ISAC tasks in LAWNs demonstrate that the proposed frameworks consistently outperform conventional multimodal fusion baselines in terms of learning performance and training sample efficiency.
