Table of Contents
Fetching ...

Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach

Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang

TL;DR

This work addresses adversarial robustness in Mixture of Experts (MoE) by revealing that expert networks are more vulnerable than routers and introducing a targeted robust training method (RT-ER) that strengthens the second-top expert via a KL-based regularization term. Building on this, the authors propose a dual-model framework that blends a standard MoE and a robust MoE with a smoothing parameter $\alpha$, and derive certified robustness bounds for both the single MoE and the dual-model. To maximize both robustness and accuracy, they introduce JTDMoE, a bi-level joint training strategy that aligns the standard and robust MoEs. Empirical results on CIFAR-10 and TinyImageNet using ResNet18 and ViT-small demonstrate substantial improvements in robust accuracy with limited loss in standard accuracy, and the code is publicly available.

Abstract

Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks. However, their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications. This paper addresses the critical question of how to incorporate robustness into MoEs while maintaining high natural accuracy. We begin by analyzing the vulnerability of MoE components, finding that expert networks are notably more susceptible to adversarial attacks than the router. Based on this insight, we propose a targeted robust training technique that integrates a novel loss function to enhance the adversarial robustness of MoE, requiring only the robustification of one additional expert without compromising training or inference efficiency. Building on this, we introduce a dual-model strategy that linearly combines a standard MoE model with our robustified MoE model using a smoothing parameter. This approach allows for flexible control over the robustness-accuracy trade-off. We further provide theoretical foundations by deriving certified robustness bounds for both the single MoE and the dual-model. To push the boundaries of robustness and accuracy, we propose a novel joint training strategy JTDMoE for the dual-model. This joint training enhances both robustness and accuracy beyond what is achievable with separate models. Experimental results on CIFAR-10 and TinyImageNet datasets using ResNet18 and Vision Transformer (ViT) architectures demonstrate the effectiveness of our proposed methods. The code is publicly available at https://github.com/TIML-Group/Robust-MoE-Dual-Model.

Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach

TL;DR

This work addresses adversarial robustness in Mixture of Experts (MoE) by revealing that expert networks are more vulnerable than routers and introducing a targeted robust training method (RT-ER) that strengthens the second-top expert via a KL-based regularization term. Building on this, the authors propose a dual-model framework that blends a standard MoE and a robust MoE with a smoothing parameter , and derive certified robustness bounds for both the single MoE and the dual-model. To maximize both robustness and accuracy, they introduce JTDMoE, a bi-level joint training strategy that aligns the standard and robust MoEs. Empirical results on CIFAR-10 and TinyImageNet using ResNet18 and ViT-small demonstrate substantial improvements in robust accuracy with limited loss in standard accuracy, and the code is publicly available.

Abstract

Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks. However, their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications. This paper addresses the critical question of how to incorporate robustness into MoEs while maintaining high natural accuracy. We begin by analyzing the vulnerability of MoE components, finding that expert networks are notably more susceptible to adversarial attacks than the router. Based on this insight, we propose a targeted robust training technique that integrates a novel loss function to enhance the adversarial robustness of MoE, requiring only the robustification of one additional expert without compromising training or inference efficiency. Building on this, we introduce a dual-model strategy that linearly combines a standard MoE model with our robustified MoE model using a smoothing parameter. This approach allows for flexible control over the robustness-accuracy trade-off. We further provide theoretical foundations by deriving certified robustness bounds for both the single MoE and the dual-model. To push the boundaries of robustness and accuracy, we propose a novel joint training strategy JTDMoE for the dual-model. This joint training enhances both robustness and accuracy beyond what is achievable with separate models. Experimental results on CIFAR-10 and TinyImageNet datasets using ResNet18 and Vision Transformer (ViT) architectures demonstrate the effectiveness of our proposed methods. The code is publicly available at https://github.com/TIML-Group/Robust-MoE-Dual-Model.

Paper Structure

This paper contains 35 sections, 2 theorems, 23 equations, 7 figures, 10 tables, 1 algorithm.

Key Result

Theorem 5.4

Under Assumption assumption_lipschitz, let $M_{R_i} \leq 1$ be an upper bound on $f_{R_i}^{(y)}({\mathbf{x}})$ for any input ${\mathbf{x}} \in {\mathbb{R}}^d$. Then the robustness bound $\epsilon$ for $F_R({\mathbf{x}})$ is:

Figures (7)

  • Figure 1: Illustration of our methods to enhancing the robustness of a single MoE and our joint training strategy for the dual-model. Right: Our single MoE robustification method enhances the robustness of a single MoE $F_R$ by introducing an additional term to reinforce the robustness of second-top expert $f_{2}$ beyond standard adversarial training. Left: The dual-model is a linear combination of a standard MoE $F_S$ and a robust MoE $F_R$. The jointly-trained dual-model (JTDMoE) improves robustness while maintaining high standard accuracy using a bi-level alternating training approach.
  • Figure 2: Performance evaluation of AT MoE and RT-ER MoE models with ResNet18 on the CIFAR-10 test dataset. We report standard accuracy (SA) and robust accuracy (RA) under a 50-step PGD attack, using models trained with a 10-step PGD attack. Our results indicate that RT-ER achieves consistently higher RA and demonstrates greater stability than AT MoE. For a comparable analysis using ViT-small, please refer to Appendix \ref{['appendix_singleMoE']}.
  • Figure 3: Performance evaluation of the Dual-Model using pre-trained MoE models. We assess the performance of the Dual-Model, which combines a standard MoE (ST) and a robust MoE (RT-ER) from Table \ref{['table_experimental_results']}. The weighting parameter $\alpha$ is incremented from 0.5 to 1.0 in steps of 0.1; at $\alpha = 1$, the Dual-Model relies exclusively on the robust MoE. All other configurations are consistent with those detailed in Figure \ref{['fig_at_moe']}.
  • Figure 4: Performance evaluation of AT MoE and RT-ER MoE models with ViT-small on the TinyImageNet test dataset. We report standard accuracy (SA) and robust accuracy (RA) under a 50-step PGD attack, using models trained with a 10-step PGD attack. Our results indicate that RT-ER achieves higher RA and demonstrates greater stability compared to AT MoE.
  • Figure 5: Performance comparison of TRADES, AdvMoE, and RT-ER with ResNet18 on the CIFAR-10 test dataset. SA and RA are evaluated under a 50-step PGD attack, using models trained with a 10-step PGD attack. RT-ER achieves higher RA and exhibits greater stability compared to TRADES and AdvMoE. Numerical analysis comparisons are presented in Table \ref{['table_trades_advmoe']}.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 5.1
  • Definition 5.2
  • Theorem 5.4
  • Theorem 5.5