Adversarial Risk Bounds via Function Transformation
Justin Khim, Po-Ling Loh
TL;DR
This work presents a principled framework for bounding adversarial risk by transforming the loss-augmented predictor into transformed predictors whose standard generalization bounds apply. The supremum transform handles linear classifiers exactly, yielding explicit bounds that separate optimization error, data geometry, and perturbation strength, while the tree transform extends this idea to neural networks, enabling finite-sample risk bounds via Rademacher complexity. The authors extend the theory to multiclass classification and regression, and propose optimization schemes to minimize these adversarial-risk bounds in practical settings. The results illuminate how adversarial perturbations affect generalization and connect robustness to distributional-robustness concepts, with practical algorithms for robust linear and neural-network training. Overall, the paper provides a versatile toolkit for analyzing and improving adversarial robustness under standard learning-theoretic guarantees.
Abstract
We derive bounds for a notion of adversarial risk, designed to characterize the robustness of linear and neural network classifiers to adversarial perturbations. Specifically, we introduce a new class of function transformations with the property that the risk of the transformed functions upper-bounds the adversarial risk of the original functions. This reduces the problem of deriving bounds on the adversarial risk to the problem of deriving risk bounds using standard learning-theoretic techniques. We then derive bounds on the Rademacher complexities of the transformed function classes, obtaining error rates on the same order as the generalization error of the original function classes. We also discuss extensions of our theory to multiclass classification and regression. Finally, we provide two algorithms for optimizing the adversarial risk bounds in the linear case, and discuss connections to regularization and distributional robustness.
