Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

Jieren Deng; Hanbin Hong; Aaron Palmer; Xin Zhou; Jinbo Bi; Kaleel Mahmood; Yuan Hong; Derek Aguiar

Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

Jieren Deng, Hanbin Hong, Aaron Palmer, Xin Zhou, Jinbo Bi, Kaleel Mahmood, Yuan Hong, Derek Aguiar

TL;DR

The paper tackles the high cost of obtaining certified robustness for large classifiers by enabling certification on pre-trained, potentially uncertified models. It introduces the Certifying Adapter Framework (CAF), which attaches trainable certifying adapters to a frozen pre-trained feature extractor and uses a smoothed adaptive classifier to obtain robustness guarantees via randomized smoothing. CAF demonstrates strong empirical gains on CIFAR-10 and competitive results on ImageNet, significantly improving certified accuracy across radii and enabling multi-scale defense through ensemble adapters. The work suggests that leveraging pre-trained representations with lightweight adapters can make certified adversarial robustness more scalable and practical in real-world settings.

Abstract

Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage high-performance pre-trained neural networks. In this work, we introduce a novel certifying adapters framework (CAF) that enables and enhances the certification of classifier adversarial robustness. Our approach makes few assumptions about the underlying training algorithm or feature extractor and is thus broadly applicable to different feature extractor architectures (e.g., convolutional neural networks or vision transformers) and smoothing algorithms. We show that CAF (a) enables certification in uncertified models pre-trained on clean datasets and (b) substantially improves the performance of certified classifiers via randomized smoothing and SmoothAdv at multiple radii in CIFAR-10 and ImageNet. We demonstrate that CAF achieves improved certified accuracies when compared to methods based on random or denoised smoothing, and that CAF is insensitive to certifying adapter hyperparameters. Finally, we show that an ensemble of adapters enables a single pre-trained feature extractor to defend against a range of noise perturbation scales.

Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

TL;DR

Abstract

Paper Structure (13 sections, 1 theorem, 11 equations, 6 figures, 4 tables)

This paper contains 13 sections, 1 theorem, 11 equations, 6 figures, 4 tables.

Introduction
Preliminaries
The Certifying Adapter Framework
Pre-trained Feature Extractor and Linear Predictor
Certifying Adapter
Smoothed Adaptive Classifier
Ensemble Adapters
Empirical Results
Classifier Adversarial Robustness
Adaption with Ensemble Adapters
CAF Sensitivity of Certifying Adapter Parameters
Discussion
Conclusions

Key Result

Theorem 1

(Adapted from Cohen et. al.rs). Given the above defined $F(\cdot)$, $G(\cdot)$ and $(x, y)$, assume that $G(\cdot)$ correctly classifies $x$ as $y$. Then, there exists a radius $r$ such that for any $x'$ satisfying the condition $||x' - x||_p \leq r$, it holds that $G(x') = G(x)$. Furthermore, $r$ c where $\Phi^{-1}$ represents the inverse of the standard Gaussian cumulative distribution function.

Figures (6)

Figure 1: The certifying adapter framework for a single input example. Certifying adapters are defined for both ViT (CAF-ViT) and CNN (CAF-CNN) architectures. Trainable model components are denoted with a dashed outline. $W \in \mathbb{R}^{d \times d}$ represents the weight matrix used in the encoder part of the Transformer Block.
Figure 2: Upper envelope of certified accuracy. CAF (Ours) is compared with RS rs and SmoothAdv smoothadv on CIFAR-10.
Figure 3: Ensemble of Certifying Adapters. A frozen pre-trained feature extractor is adapted to different noise scales (here, $0.25$ and $0.5$) with multiple certifying adapters through a hierarchical adaptation mechanism and a linear head. Trainable model components are denoted with a dashed outline.
Figure 4: Certified accuracy across different radii. We compared our method with RS rs and SmoothAdv smoothadv, using $3$ models trained at noise scales of 0.25, 0.50, and 1.00.
Figure 5: Certified accuracy across different radii for the single adapter CAF and ensemble CAF. Each CAF configuration was trained on CIFAR-10 (left) and ImageNet (right) using a ViT-B/16. Singular adapters are trained and evaluated at the same noise scale. The ensemble adapters are trained using mixed noise scales and assessed for certified accuracy using noise scales: 0.25, 0.50, and 1.00.
...and 1 more figures

Theorems & Definitions (1)

Theorem 1

Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

TL;DR

Abstract

Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (1)