Table of Contents
Fetching ...

Expert Selections In MoE Models Reveal (Almost) As Much As Text

Amir Nuriyev, Gabriel Kulp

TL;DR

This work reveals a privacy risk in mixture-of-experts language models by showing that per-token expert selections constitute a leakage channel for reconstructing input text. It introduces two decoders—a 3-layer MLP and a sequence-transformer—that map routing traces to token sequences, with the sequence decoder achieving $91.2\%$ top-1 accuracy on $10\mathrm{M}$ tokens after training on $100\mathrm{M}$ tokens, and the MLP achieving $63.1\%$ top-1 accuracy. The results connect MoE routing to embedding-inversion literature and highlight practical leakage scenarios, including distributed inference and side channels; even with noise, reconstruction is only partially mitigated. The paper argues for treating expert selections as sensitive as the underlying text and advocates mitigations such as noise, workload balancing, and isolation to limit routing-trace exposure in MoE deployments.

Abstract

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.

Expert Selections In MoE Models Reveal (Almost) As Much As Text

TL;DR

This work reveals a privacy risk in mixture-of-experts language models by showing that per-token expert selections constitute a leakage channel for reconstructing input text. It introduces two decoders—a 3-layer MLP and a sequence-transformer—that map routing traces to token sequences, with the sequence decoder achieving top-1 accuracy on tokens after training on tokens, and the MLP achieving top-1 accuracy. The results connect MoE routing to embedding-inversion literature and highlight practical leakage scenarios, including distributed inference and side channels; even with noise, reconstruction is only partially mitigated. The paper argues for treating expert selections as sensitive as the underlying text and advocates mitigations such as noise, workload balancing, and isolation to limit routing-trace exposure in MoE deployments.

Abstract

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.
Paper Structure (29 sections, 6 equations, 6 figures)

This paper contains 29 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: Accuracy of decoding tokens from expert selections on OpenWebText.
  • Figure 2: Accuracy vs. training-set size for the sequence decoder on OpenWebText.
  • Figure 3: Accuracy vs. token frequency on a 2M slice of OpenWebText.
  • Figure 4: Estimated per-layer entropy of expert selections.
  • Figure 5: Estimated mutual information between layers' expert selections.
  • ...and 1 more figures