Buffer Overflow in Mixture of Experts

Jamie Hayes; Ilia Shumailov; Itay Yona

Buffer Overflow in Mixture of Experts

Jamie Hayes, Ilia Shumailov, Itay Yona

TL;DR

The paper identifies a security risk in Mixture of Experts (MoE) deployed on shared hardware, arising from inter-batch dependencies caused by finite per-expert buffers and batch-order routing. It formalizes a threat model and introduces a random-search attack to minimize a loss that governs the adversary-controlled batch's influence on a target token, demonstrating a proof-of-concept attack on Mixtral-8x7B where the next-token prediction can be flipped. The work shows transferability of adversarial batch data to related prompts and analyzes how batch position and buffer capacity affect attack success, while offering mitigations such as randomized batch order and larger capacity slack that reduce but do not guarantee security. These findings highlight practical security risks in MoE deployments and motivate further research into robust routing, gradient-based attacks, and training-time defenses like load-balancing to safeguard shared-HW inference. Future directions include measuring worst-case routing sensitivity, exploring alternative routing schemes, and extending analysis to instruction-tuned models.

Abstract

Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady. We show that expert routing strategies that have cross-batch dependencies are vulnerable to attacks. Malicious queries can be sent to a model and can affect a model's output on other benign queries if they are grouped in the same batch. We demonstrate this via a proof-of-concept attack in a toy experimental setting.

Buffer Overflow in Mixture of Experts

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 12 sections, 3 equations, 3 figures, 2 tables, 2 algorithms.

Overview
Technical details
Mixture of Experts
Routing strategies
Threat model and attack method
Attack demonstration
Anecdotal evidence of transferability to different prompts
Does the position in batch matter?
Attack sensitivity to buffer capacity to limit
Example of a denial-of-expert attack
Mitigations
Discussion

Figures (3)

Figure 1: Overall attack flow. The adversary pushes their data into the shared batch, that already contains user data. As tokens get distributed across different experts, adversarial data fills the expert buffers that would be preferred by the user, dropping or routing their data to experts that produce suboptimal outputs.
Figure 2: Probability of correct token, "2", and (most likely) incorrect token, "1", throughout the random search attack. By the end of the attack the output token with largest probability is the incorrect token, "1".
Figure 3: Attack that constructs adversarial inputs that block the preferred expert of the majority of tokens from the target $x^*$.

Buffer Overflow in Mixture of Experts

TL;DR

Abstract

Buffer Overflow in Mixture of Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (3)