Table of Contents
Fetching ...

Efficient Optimization Algorithms for Linear Adversarial Training

Antônio H. RIbeiro, Thomas B. Schön, Dave Zahariah, Francis Bach

TL;DR

This work provides scalable solvers for adversarial training of linear models by exploiting problem structure. It introduces an augmented variable smooth reformulation for classification with projected gradient methods and an iterative reweighted ridge approach for regression, complemented by practical enhancements such as momentum, line search, and variance reduction. The methods yield provable convergence behavior and demonstrate significant speedups over general-purpose solvers like CVXPY while maintaining competitive predictive and adversarial robustness on diverse data. The results suggest that linear adversarial training can be a practical alternative in high-dimensional settings, with potential extensions to multiclass and nonlinear regimes. The accompanying codebase supports reproducibility and further exploration in large-scale applications such as genetics and high-throughput phenotyping.

Abstract

Adversarial training can be used to learn models that are robust against perturbations. For linear models, it can be formulated as a convex optimization problem. Compared to methods proposed in the context of deep learning, leveraging the optimization structure allows significantly faster convergence rates. Still, the use of generic convex solvers can be inefficient for large-scale problems. Here, we propose tailored optimization algorithms for the adversarial training of linear models, which render large-scale regression and classification problems more tractable. For regression problems, we propose a family of solvers based on iterative ridge regression and, for classification, a family of solvers based on projected gradient descent. The methods are based on extended variable reformulations of the original problem. We illustrate their efficiency in numerical examples.

Efficient Optimization Algorithms for Linear Adversarial Training

TL;DR

This work provides scalable solvers for adversarial training of linear models by exploiting problem structure. It introduces an augmented variable smooth reformulation for classification with projected gradient methods and an iterative reweighted ridge approach for regression, complemented by practical enhancements such as momentum, line search, and variance reduction. The methods yield provable convergence behavior and demonstrate significant speedups over general-purpose solvers like CVXPY while maintaining competitive predictive and adversarial robustness on diverse data. The results suggest that linear adversarial training can be a practical alternative in high-dimensional settings, with potential extensions to multiclass and nonlinear regimes. The accompanying codebase supports reproducibility and further exploration in large-scale applications such as genetics and high-throughput phenotyping.

Abstract

Adversarial training can be used to learn models that are robust against perturbations. For linear models, it can be formulated as a convex optimization problem. Compared to methods proposed in the context of deep learning, leveraging the optimization structure allows significantly faster convergence rates. Still, the use of generic convex solvers can be inefficient for large-scale problems. Here, we propose tailored optimization algorithms for the adversarial training of linear models, which render large-scale regression and classification problems more tractable. For regression problems, we propose a family of solvers based on iterative ridge regression and, for classification, a family of solvers based on projected gradient descent. The methods are based on extended variable reformulations of the original problem. We illustrate their efficiency in numerical examples.

Paper Structure

This paper contains 41 sections, 9 theorems, 59 equations, 13 figures, 4 tables, 8 algorithms.

Key Result

proposition 1

Let $\rho > 0$ and $\ell(y, \hat{y}) = h(y\hat{y})$ for $h$ non-increasing, 1-smooth and convex. The optimization of eq:advtrain is equivalent to Moreover, the function $\mathcal{R}$ is $L$-smooth and jointly convex. And, $L \le\frac{1}{2} \lambda_{\max_{}}(\frac{1}{n}\sum_i \boldsymbol{x}_i\boldsymbol{x}_i^\top)$ if ${\rho^2 \le\lambda_{\max_{}}(\frac{1}{n}\sum_i \boldsymbol{x}_i\boldsymbol{x}_i

Figures (13)

  • Figure 1: Adversarial training in linear regression.Left: we compare linear adversarial training ($\ell_\infty$-norm bounded attacks) using the default adversarial radius---see ribeiro_regularization_2023 and the description in \ref{['sec:default-value']}---and Lasso with parameters selected through cross-validation. Right: the running times for prunned MAGIC dataset of adversarial training with our tailored solver, with CVXPY, and of cross-validation Lasso, optimized as in friedman_pathwise_2007. The error bars in (a) show the interquartile range obtained using bootstrap. In (b), we plot the median of 5 repetitions.
  • Figure 2: Convergence of Projected GD. Sub-optimality vs number of iterations in the MNIST dataset. Top: we show the results for different step sizes. Middle: we show the effect of momentum (with line search implemented for both methods). Bottom: we compare GD, SGD and SAGA. Results for $\ell_\infty$-adv. training with $\delta=0.01$. Here one epoch is a full pass throughout the dataset.
  • Figure 3: Comparison with FGSM. Sub-optimality vs iterations. On the top, we compare our methods with the gradient descent implementation of FGSM. On the bottom, with the stochastic implementation of FGSM. Results for $\ell_\infty$-adversarial training ($\delta=0.01$).
  • Figure 4: Conjugate Gradient.Top: execution time of 100 iterations in Abalone dataset. Bottom: convergence---suboptimality vs iteration.
  • Figure 5: Test performance.Top: Spiked eigenvalue models: performance vs fraction of relevant eigenvalues components. Bottom: Sparse parameter model: performance vs fraction of non-zero parameters. $R$-squared is the coefficient of determination (higher is better).
  • ...and 8 more figures

Theorems & Definitions (13)

  • proposition 1
  • proof : Proof of \ref{['thm:advtrain-classif-closeform']}
  • proposition 2: Projection step
  • proposition 3
  • proof : Proof of \ref{['thm:advtraining-closeform']}
  • proposition 4: $\eta$-trick
  • theorem 1: ribeiro_regularization_2023, Section 8
  • proposition 5: Zero solution of adversarial training
  • theorem 2
  • proposition 6
  • ...and 3 more