Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

Yatong Bai; Brendon G. Anderson; Somayeh Sojoudi

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

Yatong Bai, Brendon G. Anderson, Somayeh Sojoudi

TL;DR

The paper addresses the persistent accuracy-robustness trade-off in deep classifiers for safety-critical control by proposing a training-free mixing of a high-accuracy standard model $g(\cdot)$ and a robust model $h(\cdot)$. The authors formulate a convex combination of output probabilities, producing the mixed classifier $h^{\alpha}$, and establish theoretical certified robustness radii under Lipschitz and randomized smoothing assumptions, showing robustness is inherited when $\alpha\in[1/2,1]$ and $h(\cdot)$ provides a nonzero margin $(1-\alpha)/\alpha$. Empirically, the method improves the trade-off on CIFAR-10, with results indicating that the robust base’s confidence on correct predictions under attack is a key driver of performance gains, while preserving substantial clean accuracy. The approach enables leveraging pre-trained standard models with advances in robust training without additional training, offering a practical route to robust, high-performance control systems in safety-critical settings.

Abstract

Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier.

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

TL;DR

The paper addresses the persistent accuracy-robustness trade-off in deep classifiers for safety-critical control by proposing a training-free mixing of a high-accuracy standard model

and a robust model

. The authors formulate a convex combination of output probabilities, producing the mixed classifier

, and establish theoretical certified robustness radii under Lipschitz and randomized smoothing assumptions, showing robustness is inherited when

and

provides a nonzero margin

. Empirically, the method improves the trade-off on CIFAR-10, with results indicating that the robust base’s confidence on correct predictions under attack is a key driver of performance gains, while preserving substantial clean accuracy. The approach enables leveraging pre-trained standard models with advances in robust training without additional training, offering a practical route to robust, high-performance control systems in safety-critical settings.

Abstract

radius on an input can result in the misclassification of the mixed classifier.

Paper Structure (14 sections, 4 theorems, 15 equations, 4 figures, 2 tables)

This paper contains 14 sections, 4 theorems, 15 equations, 4 figures, 2 tables.

Introduction
Background and related works
Notations
Related Adversarial Attacks and Defenses
Locally Biased Smoothing
Using a Robust Neural Network as the Smoothing Oracle
Theoretical Certified Robust Radius
Numerical Experiments
alpha's Influence on Mixed Classifier Robustness
The Relationship between $h^{\alpha} (\cdot)$'s Robustness and $h (\cdot)$'s Confidence
Visualization of the Certified Robust Radii
Conclusions
Additional Empirical Support for R_i(x)=1
Proof of \ref{['thm: randomized_smoothing']}

Key Result

lemma 1

Let $x \in {\mathbb{R}}^d$ and $r \ge 0$. If it holds that $\alpha \in [\frac{1}{2}, 1]$ and $h (\cdot)$ is certifiably robust at $x$ with margin $\frac{1-\alpha}{\alpha}$ and radius $r$, then the mixed classifier $h^{\alpha}(\cdot)$ is robust in the sense that $\mathop{\mathrm{arg\,max}}\limits_{i}

Figures (4)

Figure 1: Comparing the "attacked accuracy -- clean accuracy" curves for various options for $R_i (x)$.
Figure 2: The accuracy of the mixed classifier $h^{\alpha} (\cdot)$ at various $\alpha$ values. "STD attack", "ROB attack", and "MIX attack" refer to the PGD$_{20}$ attack generated using the gradient of $g (\cdot)$, $h (\cdot)$, and $h^{\alpha} (\cdot)$ respectively, with $\epsilon$ set to $\frac{8}{255}$.
Figure 3: Comparing the certified accuracy-robustness trade-off of RS models and our mixed classifier using both Lipschitz-based (Lip-based) certificates and RS-based certificates (Theorems \ref{['thm: certified_radius']} and \ref{['thm: randomized_smoothing']}, respectively). The clean accuracy is the same between $h_\text{baseline} (\cdot)$ and $h^{\alpha} (\cdot)$ in each subfigure, and the empty circles represent discontinuity in the certified accuracy at radius $0$.
Figure 4: Comparing the options for $R_i (x)$ with alternative selections of base classifiers.

Theorems & Definitions (9)

definition 1
lemma 1
proof
definition 2
theorem 1
proof
theorem 2
theorem 5: Restated
proof

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

TL;DR

Abstract

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (9)