Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge
Shengling Qin, Hai Wu, Hongyang Du, Kaibin Huang
TL;DR
This work tackles energy-efficient expert coordination for distributed Mixture-of-Experts at the wireless edge by formulating two intertwined optimization problems: expert selection (P1) and joint expert with subcarrier allocation (P2). It introduces Dynamic Expert Selection (DES), a tree-search framework with a linear-relaxation bound, to balance task-relevance and channel conditions, and Joint Expert and Subcarrier Allocation (JESA), a block coordinate descent approach that leverages a unique problem structure to achieve asymptotic optimality as subcarriers increase. Empirical results on multi-domain tasks with Llama-3-based experts show DES delivers high accuracy with significantly reduced energy, while JESA achieves substantial energy savings (often around 50% or more) with modest accuracy trade-offs, outperforming Top-k and homogeneous schemes. Collectively, the framework advances integrated AI-and-communications at the edge, offering tunable tradeoffs via layer-wise importance and enabling scalable, energy-conscious edge inference for future 6G networks.
Abstract
The emergence of distributed Mixture-of-Experts (DMoE) systems, which deploy expert models at edge nodes, offers a pathway to achieving connected intelligence in sixth-generation (6G) mobile networks and edge artificial intelligence (AI). However, current DMoE systems lack an effective expert selection algorithm to address the simultaneous task-expert relevance and channel diversity inherent in these systems. Traditional AI or communication systems focus on either performance or channel conditions, and direct application of these methods leads to high communication overhead or low performance. To address this, we propose the DMoE protocol to schedule the expert inference and inter-expert transmission. This protocol identifies expert selection and subcarrier allocation as key optimization problems. We formulate an expert selection problem by incorporating both AI performance and channel conditions, and further extend it to a Joint Expert and Subcarrier Allocation (JESA) problem for comprehensive AI and channel management within the DMoE framework. For the NP-hard expert selection problem, we introduce the Dynamic Expert Selection (DES) algorithm, which leverages a linear relaxation as a bounding criterion to significantly reduce search complexity. For the JESA problem, we discover a unique structural property that ensures asymptotic optimality in most scenarios. We propose an iterative algorithm that addresses subcarrier allocation as a subproblem and integrates it with the DES algorithm. The proposed framework effectively manages the tradeoff between task relevance and channel conditions through a tunable importance factor, enabling flexible adaptation to diverse scenarios. Numerical experiments validate the dual benefits of the proposed expert selection algorithm: high performance and significantly reduced cost.
