Robust width: A lightweight and certifiable adversarial defense

Jonathan Peck; Bart Goossens

Robust width: A lightweight and certifiable adversarial defense

Jonathan Peck, Bart Goossens

TL;DR

The paper tackles adversarial vulnerability in deep networks by introducing a lightweight, training-free defense built on the robust width property (RWP) from compressed sensing. It constructs a plug-and-play purification pipeline using a random sensing operator and a CS-based denoiser, yielding probabilistic robustness guarantees for approximately sparse data and enabling certifiable robustness without adversarial training. The approach achieves strong empirical robustness on ImageNet, outperforming state-of-the-art black-box defenses at large perturbation budgets and closely matching white-box baselines, while keeping standard accuracy largely intact. The method is backed by theoretical guarantees, practical certification bounds, and publicly available code, making it a scalable alternative for resource-constrained settings and datasets lacking large annotated corpora.

Abstract

Deep neural networks are vulnerable to so-called adversarial examples: inputs which are intentionally constructed to cause the model to make incorrect predictions or classifications. Adversarial examples are often visually indistinguishable from natural data samples, making them hard to detect. As such, they pose significant threats to the reliability of deep learning systems. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse. The defense is easy to implement and can be applied to any existing model without additional training or finetuning. We empirically validate the defense on ImageNet against $L^\infty$ perturbations at perturbation budgets ranging from $4/255$ to $32/255$. In the black-box setting, our method significantly outperforms the state-of-the-art, especially for large perturbations. In the white-box setting, depending on the choice of base classifier, we closely match the state of the art in robust ImageNet classification while avoiding the need for additional data, larger models or expensive adversarial training routines. Our code is available at https://github.com/peck94/robust-width-defense.

Robust width: A lightweight and certifiable adversarial defense

TL;DR

Abstract

perturbations at perturbation budgets ranging from

. In the black-box setting, our method significantly outperforms the state-of-the-art, especially for large perturbations. In the white-box setting, depending on the choice of base classifier, we closely match the state of the art in robust ImageNet classification while avoiding the need for additional data, larger models or expensive adversarial training routines. Our code is available at https://github.com/peck94/robust-width-defense.

Paper Structure (21 sections, 9 theorems, 64 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 21 sections, 9 theorems, 64 equations, 4 figures, 7 tables, 2 algorithms.

Introduction
Preliminaries
Adversarial robustness
Compressed sensing
The robust width property
Related work
RWP-based adversarial defense
Adversarial denoiser
Construction of a RWP-based adversarial defense
Probabilistic adversarial defense method
Certification
Example: linear classification of sparse vectors
Experiments
Threat models
Hyperparameter tuning
...and 6 more sections

Key Result

Theorem 2.6

Let $(\mathcal{H}, \mathcal{A}, \left\|\cdot\right\|_\sharp)$ be a CS space with bound $L$ and let $\Phi: \mathcal{H} \to \mathcal{H}'$ be a linear operator satisfying the $(\rho, \alpha)$-RWP. Let $x^\natural \in \mathcal{H}$, $e \in \mathcal{H}'$ with $\left\|e\right\|_2 \leq \varepsilon$ for $\va provided that $\rho \leq \frac{1}{4L}$.

Figures (4)

Figure 1: Examples of adversarial perturbations in image recognition.
Figure 2: Diagram of the sparsifying front-end proposed by marzi2018sparsity.
Figure 3: A hypothetical scenario where RS can incorrectly change the predicted label of a given sample $x$. The solid black line is the decision boundary and the dashed circle around $x$ represents an area of high probability where RS will sample from, such as a 99% interval. If the original classification of $x$ by the base classifier was correct, RS will increase robustness around $x$ but at the cost of lowering accuracy.
Figure 4: Hyperparameter search results for the different ImageNet classifiers. Trials which obtained less than 10% standard or robust accuracy have been omitted. The Pareto front is highlighted in black, and some hyperparameter configurations along the front are given for illustration.

Theorems & Definitions (25)

Definition 2.1: Robustness of a classifier on a set
Definition 2.2: Frame
Definition 2.3: CS space
Definition 2.4: Restricted isometry property
Definition 2.5: Robust width property
Theorem 2.6: cahill2021robust
Definition 2.7: Sparsity defect
Definition 4.1: Adversarial denoiser
Definition 4.2: Deterministic adversarial defense method
Definition 4.3: Robustness gain of a deterministic adversarial defense method
...and 15 more

Robust width: A lightweight and certifiable adversarial defense

TL;DR

Abstract

Robust width: A lightweight and certifiable adversarial defense

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (25)