Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function
Qiuchen Song, Shusen Jing, Shuai Zhang, Songyang Zhang, Chuan Huang
TL;DR
This work tackles the challenge of deploying mixture-of-experts in distributed wireless edge computing by introducing a channel-aware gating mechanism that jointly considers feature-expert alignment and downlink channel quality. A two-stage training procedure first optimizes a standard MoE under perfect channels and then introduces simulated wireless channels with regularization to prevent gating collapse, enabling robust routing under realistic conditions. Experiments across Lenet-5, ResNet-18, and ViT on CIFAR-10/100 show that channel-aware gating improves accuracy over naive gating in both analog and digital transmissions, with digital performance more sensitive to channel coding. The results highlight the practical potential of integrating channel state information into MoE routing for efficient, robust distributed inference in wireless edge networks, and point to future extensions in representation tokenization and broader AI tasks.
Abstract
In a distributed mixture-of-experts (MoE) system, a server collaborates with multiple specialized expert clients to perform inference. The server extracts features from input data and dynamically selects experts based on their areas of specialization to produce the final output. Although MoE models are widely valued for their flexibility and performance benefits, adapting distributed MoEs to operate effectively in wireless networks has remained unexplored. In this work, we introduce a novel channel-aware gating function for wireless distributed MoE, which incorporates channel conditions into the MoE gating mechanism. To train the channel-aware gating, we simulate various signal-to-noise ratios (SNRs) for each expert's communication channel and add noise to the features distributed to the experts based on these SNRs. The gating function then utilizes both features and SNRs to optimize expert selection. Unlike conventional MoE models which solely consider the alignment of features with the specializations of experts, our approach additionally considers the impact of channel conditions on expert performance. Experimental results demonstrate that the proposed channel-aware gating scheme outperforms traditional MoE models.
