Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

Chen Dun; Mirian Hipolito Garcia; Guoqing Zheng; Ahmed Hassan Awadallah; Anastasios Kyrillidis; Robert Sim

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim

TL;DR

This work tackles prompt-tuning under task and data heterogeneity by proposing Mixture of Prompts (MoPs) with a smart gating function that dynamically combines multiple prompt-experts based on input data domains. Prompts are injected at intermediate layers, with a gating network selecting relevant experts via a lightweight MLP over average-layer embeddings, enabling just-in-time skill composition while reducing training interference. MoPs demonstrate strong empirical gains in both centralized and federated settings, including substantial perplexity reductions (up to $\sim70\%$ federated, $\sim30\%$ centralized) and robustness to model compression (pruning and quantization). The approach also aligns with or surpasses state-of-the-art PEFT methods like LoRA and AI$^3$ under various compression regimes, illustrating practical impact for scalable, privacy-aware LLM deployment across diverse tasks and data sources.

Abstract

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

TL;DR

federated,

centralized) and robustness to model compression (pruning and quantization). The approach also aligns with or surpasses state-of-the-art PEFT methods like LoRA and AI

under various compression regimes, illustrating practical impact for scalable, privacy-aware LLM deployment across diverse tasks and data sources.

Abstract

up to

, as compared to baselines, in the federated scenario, and from

up to

in the centralized scenario.

Paper Structure (15 sections, 4 equations, 11 figures, 9 tables)

This paper contains 15 sections, 4 equations, 11 figures, 9 tables.

Introduction
Related Works
Background
MoPs with a smart gating function
Mixture of Experts (MoPs) Design
Experiments
Conclusions
A. Federated skew distribution
B. Centralized Training - Gating function Analysis
C. Federated Training - Gating function Analysis
D. Quantization Results
E. Evaluating MoPs performance against PEFT methods
F. Pushing soft-prompts performance through MoPs "hyperparameters"
G. Using the Phi-2 model as an alternative LLM basis.
H. Hyperparameter settings for results presented in main text

Figures (11)

Figure 1: Multi-Source Multi-task Training
Figure 2: Mixture of prompts with a smart gating function on compressed LLMs overview.
Figure 3: Layer Injection impact on Llama-7B for different unstructured pruning ratios (Dolly-15k) in the centralized setup. Injection on $L_{\text{int}}=10$ outperforms the baseline.
Figure 4: Prompt Injection impact on Llama-7B for different unstructured pruning ratios (Dolly-15k - centralized setup).
Figure 5: Averaged prompt weights for test dataset using 3:4 (75%) structured pruning Llama-7B. This verifies how MoPs learn experts with a specialized skill set.
...and 6 more figures

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

TL;DR

Abstract

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)