Table of Contents
Fetching ...

Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function

Qiuchen Song, Shusen Jing, Shuai Zhang, Songyang Zhang, Chuan Huang

TL;DR

This work tackles the challenge of deploying mixture-of-experts in distributed wireless edge computing by introducing a channel-aware gating mechanism that jointly considers feature-expert alignment and downlink channel quality. A two-stage training procedure first optimizes a standard MoE under perfect channels and then introduces simulated wireless channels with regularization to prevent gating collapse, enabling robust routing under realistic conditions. Experiments across Lenet-5, ResNet-18, and ViT on CIFAR-10/100 show that channel-aware gating improves accuracy over naive gating in both analog and digital transmissions, with digital performance more sensitive to channel coding. The results highlight the practical potential of integrating channel state information into MoE routing for efficient, robust distributed inference in wireless edge networks, and point to future extensions in representation tokenization and broader AI tasks.

Abstract

In a distributed mixture-of-experts (MoE) system, a server collaborates with multiple specialized expert clients to perform inference. The server extracts features from input data and dynamically selects experts based on their areas of specialization to produce the final output. Although MoE models are widely valued for their flexibility and performance benefits, adapting distributed MoEs to operate effectively in wireless networks has remained unexplored. In this work, we introduce a novel channel-aware gating function for wireless distributed MoE, which incorporates channel conditions into the MoE gating mechanism. To train the channel-aware gating, we simulate various signal-to-noise ratios (SNRs) for each expert's communication channel and add noise to the features distributed to the experts based on these SNRs. The gating function then utilizes both features and SNRs to optimize expert selection. Unlike conventional MoE models which solely consider the alignment of features with the specializations of experts, our approach additionally considers the impact of channel conditions on expert performance. Experimental results demonstrate that the proposed channel-aware gating scheme outperforms traditional MoE models.

Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function

TL;DR

This work tackles the challenge of deploying mixture-of-experts in distributed wireless edge computing by introducing a channel-aware gating mechanism that jointly considers feature-expert alignment and downlink channel quality. A two-stage training procedure first optimizes a standard MoE under perfect channels and then introduces simulated wireless channels with regularization to prevent gating collapse, enabling robust routing under realistic conditions. Experiments across Lenet-5, ResNet-18, and ViT on CIFAR-10/100 show that channel-aware gating improves accuracy over naive gating in both analog and digital transmissions, with digital performance more sensitive to channel coding. The results highlight the practical potential of integrating channel state information into MoE routing for efficient, robust distributed inference in wireless edge networks, and point to future extensions in representation tokenization and broader AI tasks.

Abstract

In a distributed mixture-of-experts (MoE) system, a server collaborates with multiple specialized expert clients to perform inference. The server extracts features from input data and dynamically selects experts based on their areas of specialization to produce the final output. Although MoE models are widely valued for their flexibility and performance benefits, adapting distributed MoEs to operate effectively in wireless networks has remained unexplored. In this work, we introduce a novel channel-aware gating function for wireless distributed MoE, which incorporates channel conditions into the MoE gating mechanism. To train the channel-aware gating, we simulate various signal-to-noise ratios (SNRs) for each expert's communication channel and add noise to the features distributed to the experts based on these SNRs. The gating function then utilizes both features and SNRs to optimize expert selection. Unlike conventional MoE models which solely consider the alignment of features with the specializations of experts, our approach additionally considers the impact of channel conditions on expert performance. Experimental results demonstrate that the proposed channel-aware gating scheme outperforms traditional MoE models.

Paper Structure

This paper contains 14 sections, 12 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Structure of MoE-based Edge Computing: The system is deployed across the server of the base station and edge device in the wireless communication environment. The backbone network and gating network are operated at BS, while the expert networks are distributed across the edge devices
  • Figure 2: Data Workflow in the Channel-Aware MoE: In the channel-aware MoE, input data is initially processed by the backbone network into latent embeddings, which are then transmitted to the corresponding expert devices via wireless links, as directed by the gating network. The expert networks provide feedback on wireless channel conditions to the gating network, enhancing the robustness of both latent representation dispatching and expert output integration within the gating network.
  • Figure 3: Classification accuracy of Lenet-5 on CIFAR-10 dataset
  • Figure 4: Classification accuracy of ResNet-18 on CIFAR-10 dataset
  • Figure 5: Classification accuracy of ViT on CIFAR-100 dataset
  • ...and 1 more figures