Robust width: A lightweight and certifiable adversarial defense
Jonathan Peck, Bart Goossens
TL;DR
The paper tackles adversarial vulnerability in deep networks by introducing a lightweight, training-free defense built on the robust width property (RWP) from compressed sensing. It constructs a plug-and-play purification pipeline using a random sensing operator and a CS-based denoiser, yielding probabilistic robustness guarantees for approximately sparse data and enabling certifiable robustness without adversarial training. The approach achieves strong empirical robustness on ImageNet, outperforming state-of-the-art black-box defenses at large perturbation budgets and closely matching white-box baselines, while keeping standard accuracy largely intact. The method is backed by theoretical guarantees, practical certification bounds, and publicly available code, making it a scalable alternative for resource-constrained settings and datasets lacking large annotated corpora.
Abstract
Deep neural networks are vulnerable to so-called adversarial examples: inputs which are intentionally constructed to cause the model to make incorrect predictions or classifications. Adversarial examples are often visually indistinguishable from natural data samples, making them hard to detect. As such, they pose significant threats to the reliability of deep learning systems. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse. The defense is easy to implement and can be applied to any existing model without additional training or finetuning. We empirically validate the defense on ImageNet against $L^\infty$ perturbations at perturbation budgets ranging from $4/255$ to $32/255$. In the black-box setting, our method significantly outperforms the state-of-the-art, especially for large perturbations. In the white-box setting, depending on the choice of base classifier, we closely match the state of the art in robust ImageNet classification while avoiding the need for additional data, larger models or expensive adversarial training routines. Our code is available at https://github.com/peck94/robust-width-defense.
