Buffer Overflow in Mixture of Experts
Jamie Hayes, Ilia Shumailov, Itay Yona
TL;DR
The paper identifies a security risk in Mixture of Experts (MoE) deployed on shared hardware, arising from inter-batch dependencies caused by finite per-expert buffers and batch-order routing. It formalizes a threat model and introduces a random-search attack to minimize a loss that governs the adversary-controlled batch's influence on a target token, demonstrating a proof-of-concept attack on Mixtral-8x7B where the next-token prediction can be flipped. The work shows transferability of adversarial batch data to related prompts and analyzes how batch position and buffer capacity affect attack success, while offering mitigations such as randomized batch order and larger capacity slack that reduce but do not guarantee security. These findings highlight practical security risks in MoE deployments and motivate further research into robust routing, gradient-based attacks, and training-time defenses like load-balancing to safeguard shared-HW inference. Future directions include measuring worst-case routing sensitivity, exploring alternative routing schemes, and extending analysis to instruction-tuned models.
Abstract
Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady. We show that expert routing strategies that have cross-batch dependencies are vulnerable to attacks. Malicious queries can be sent to a model and can affect a model's output on other benign queries if they are grouped in the same batch. We demonstrate this via a proof-of-concept attack in a toy experimental setting.
