Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense

Qiao Han; yong huang; xinling Guo; Yiteng Zhai; Yu Qin; Yao Yang

Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense

Qiao Han, yong huang, xinling Guo, Yiteng Zhai, Yu Qin, Yao Yang

TL;DR

Immunity presents a robust adversarial defense by augmenting Mixture-of-Experts with Random Switch Gates and Grad-CAM-guided regularizers. It introduces two novel losses, a heatmap-based mutual information objective and a position-stability regularizer, to diversify and stabilize expert representations, while enabling random gating during inference to hinder targeted attacks. Through extensive CIFAR-10/100 experiments, Immunity demonstrates improved resilience against FGSM, BIM, MIM, and PGD attacks, particularly under adversarial training, and provides interpretable heatmaps that reveal specialized expert focus. The work highlights the benefits of ensemble diversity, causality-focused learning, and heatmap-based regularization for practical, robust AI systems.

Abstract

Recent studies have revealed the vulnerability of Deep Neural Networks (DNNs) to adversarial examples, which can easily fool DNNs into making incorrect predictions. To mitigate this deficiency, we propose a novel adversarial defense method called "Immunity" (Innovative MoE with MUtual information \& positioN stabilITY) based on a modified Mixture-of-Experts (MoE) architecture in this work. The key enhancements to the standard MoE are two-fold: 1) integrating of Random Switch Gates (RSGs) to obtain diverse network structures via random permutation of RSG parameters at evaluation time, despite of RSGs being determined after one-time training; 2) devising innovative Mutual Information (MI)-based and Position Stability-based loss functions by capitalizing on Grad-CAM's explanatory power to increase the diversity and the causality of expert networks. Notably, our MI-based loss operates directly on the heatmaps, thereby inducing subtler negative impacts on the classification performance when compared to other losses of the same type, theoretically. Extensive evaluation validates the efficacy of the proposed approach in improving adversarial robustness against a wide range of attacks.

Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense

TL;DR

Abstract

Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense

Authors

TL;DR

Abstract

Table of Contents

Figures (2)