Table of Contents
Fetching ...

Scaling the Convex Barrier with Sparse Dual Algorithms

Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H. S. Torr, M. Pawan Kumar

TL;DR

This work tackles scalable neural network verification by moving beyond the common Planet relaxation to a tighter relaxation that captures cross-layer interactions via exponential constraint sets. It develops two sparse dual solvers, Active Set and Saddle Point, to solve the dual of this tighter relaxation efficiently, enabling strong lower bounds with GPU-parallelizable computations. A Big-M initialization provides a practical starting point, while a cutting-plane approach can tighten the bounds further when needed. The authors demonstrate substantial speedups and improved verification performance in both incomplete and complete settings, including branch-and-bound scenarios, and provide a scalable framework with practical implementations. Overall, the paper shows that combining tight relaxations with sparse, memory-efficient dual optimization markedly enhances neural network verification capabilities with real-world impact for robustness and safety-critical analyses.

Abstract

Tight and efficient neural network bounding is crucial to the scaling of neural network verification systems. Many efficient bounding algorithms have been presented recently, but they are often too loose to verify more challenging properties. This is due to the weakness of the employed relaxation, which is usually a linear program of size linear in the number of neurons. While a tighter linear relaxation for piecewise-linear activations exists, it comes at the cost of exponentially many constraints and currently lacks an efficient customized solver. We alleviate this deficiency by presenting two novel dual algorithms: one operates a subgradient method on a small active set of dual variables, the other exploits the sparsity of Frank-Wolfe type optimizers to incur only a linear memory cost. Both methods recover the strengths of the new relaxation: tightness and a linear separation oracle. At the same time, they share the benefits of previous dual approaches for weaker relaxations: massive parallelism, GPU implementation, low cost per iteration and valid bounds at any time. As a consequence, we can obtain better bounds than off-the-shelf solvers in only a fraction of their running time, attaining significant formal verification speed-ups.

Scaling the Convex Barrier with Sparse Dual Algorithms

TL;DR

This work tackles scalable neural network verification by moving beyond the common Planet relaxation to a tighter relaxation that captures cross-layer interactions via exponential constraint sets. It develops two sparse dual solvers, Active Set and Saddle Point, to solve the dual of this tighter relaxation efficiently, enabling strong lower bounds with GPU-parallelizable computations. A Big-M initialization provides a practical starting point, while a cutting-plane approach can tighten the bounds further when needed. The authors demonstrate substantial speedups and improved verification performance in both incomplete and complete settings, including branch-and-bound scenarios, and provide a scalable framework with practical implementations. Overall, the paper shows that combining tight relaxations with sparse, memory-efficient dual optimization markedly enhances neural network verification capabilities with real-world impact for robustness and safety-critical analyses.

Abstract

Tight and efficient neural network bounding is crucial to the scaling of neural network verification systems. Many efficient bounding algorithms have been presented recently, but they are often too loose to verify more challenging properties. This is due to the weakness of the employed relaxation, which is usually a linear program of size linear in the number of neurons. While a tighter linear relaxation for piecewise-linear activations exists, it comes at the cost of exponentially many constraints and currently lacks an efficient customized solver. We alleviate this deficiency by presenting two novel dual algorithms: one operates a subgradient method on a small active set of dual variables, the other exploits the sparsity of Frank-Wolfe type optimizers to incur only a linear memory cost. Both methods recover the strengths of the new relaxation: tightness and a linear separation oracle. At the same time, they share the benefits of previous dual approaches for weaker relaxations: massive parallelism, GPU implementation, low cost per iteration and valid bounds at any time. As a consequence, we can obtain better bounds than off-the-shelf solvers in only a fraction of their running time, attaining significant formal verification speed-ups.

Paper Structure

This paper contains 53 sections, 3 theorems, 52 equations, 17 figures, 4 tables, 3 algorithms.

Key Result

Proposition 1

$\bm{\beta}_{k,I^*_k}$ as defined in equation eq:selection represents the Lagrangian multipliers associated to the most violated constraints from $\mathcal{A}_{\mathcal{E}, k}$ at $\left(\mathbf{x}^*, \mathbf{z}^*\right) \in \mathop{\mathrm{\arg\!\min}}\limits_{\mathbf{x}, \mathbf{z}} \mathcal{L}_{\

Figures (17)

  • Figure 1: Upper bounds to the adversarial vulnerability for the SGD-trained network from BunelDP20. Box plots: distribution of runtime in seconds. Violin plots: difference with the bounds obtained by Gurobi with a cut from $\mathcal{A}_k$ per neuron; higher is better, the width at a given value represents the proportion of problems for which this is the result. On average, both Active Set and Saddle Point achieve bounds tighter than Gurobi 1 cut with a smaller runtime.
  • Figure 2: Pointwise comparison for a subset of the methods on the data presented in Figure \ref{['fig:sgd8']}. Darker colour shades mean higher point density (on a logarithmic scale). The oblique dotted line corresponds to the equality.
  • Figure 3: Pointwise comparison between our proposed solvers on the data presented in Figure \ref{['fig:sgd8']}. Darker colour shades mean higher point density (on a logarithmic scale). The oblique dotted line corresponds to the equality.
  • Figure 4: Upper bounds to the adversarial vulnerability of a fully connected network with two hidden layers of width $7000$. Box plots: distribution of runtime in seconds. Violin plots: difference with the bounds obtained by Gurobi with a cut from $\mathcal{A}_k$ per neuron; higher is better. When run for enough iterations, Saddle Point achieves bounds tighter than both Gurobi 1 cut and Active Set, whose tightness is constrained by memory, in less time.
  • Figure 5: Cactus plots on properties from Lu2020Neural, displaying the percentage of solved properties as a function of runtime. Baselines are represented by dotted lines.
  • ...and 12 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • Proposition 3