Table of Contents
Fetching ...

Buffer Overflow in Mixture of Experts

Jamie Hayes, Ilia Shumailov, Itay Yona

TL;DR

The paper identifies a security risk in Mixture of Experts (MoE) deployed on shared hardware, arising from inter-batch dependencies caused by finite per-expert buffers and batch-order routing. It formalizes a threat model and introduces a random-search attack to minimize a loss that governs the adversary-controlled batch's influence on a target token, demonstrating a proof-of-concept attack on Mixtral-8x7B where the next-token prediction can be flipped. The work shows transferability of adversarial batch data to related prompts and analyzes how batch position and buffer capacity affect attack success, while offering mitigations such as randomized batch order and larger capacity slack that reduce but do not guarantee security. These findings highlight practical security risks in MoE deployments and motivate further research into robust routing, gradient-based attacks, and training-time defenses like load-balancing to safeguard shared-HW inference. Future directions include measuring worst-case routing sensitivity, exploring alternative routing schemes, and extending analysis to instruction-tuned models.

Abstract

Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady. We show that expert routing strategies that have cross-batch dependencies are vulnerable to attacks. Malicious queries can be sent to a model and can affect a model's output on other benign queries if they are grouped in the same batch. We demonstrate this via a proof-of-concept attack in a toy experimental setting.

Buffer Overflow in Mixture of Experts

TL;DR

The paper identifies a security risk in Mixture of Experts (MoE) deployed on shared hardware, arising from inter-batch dependencies caused by finite per-expert buffers and batch-order routing. It formalizes a threat model and introduces a random-search attack to minimize a loss that governs the adversary-controlled batch's influence on a target token, demonstrating a proof-of-concept attack on Mixtral-8x7B where the next-token prediction can be flipped. The work shows transferability of adversarial batch data to related prompts and analyzes how batch position and buffer capacity affect attack success, while offering mitigations such as randomized batch order and larger capacity slack that reduce but do not guarantee security. These findings highlight practical security risks in MoE deployments and motivate further research into robust routing, gradient-based attacks, and training-time defenses like load-balancing to safeguard shared-HW inference. Future directions include measuring worst-case routing sensitivity, exploring alternative routing schemes, and extending analysis to instruction-tuned models.

Abstract

Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady. We show that expert routing strategies that have cross-batch dependencies are vulnerable to attacks. Malicious queries can be sent to a model and can affect a model's output on other benign queries if they are grouped in the same batch. We demonstrate this via a proof-of-concept attack in a toy experimental setting.
Paper Structure (12 sections, 3 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 12 sections, 3 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overall attack flow. The adversary pushes their data into the shared batch, that already contains user data. As tokens get distributed across different experts, adversarial data fills the expert buffers that would be preferred by the user, dropping or routing their data to experts that produce suboptimal outputs.
  • Figure 2: Probability of correct token, "2", and (most likely) incorrect token, "1", throughout the random search attack. By the end of the attack the output token with largest probability is the incorrect token, "1".
  • Figure 3: Attack that constructs adversarial inputs that block the preferred expert of the majority of tokens from the target $x^*$.