Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness
Jieren Deng, Hanbin Hong, Aaron Palmer, Xin Zhou, Jinbo Bi, Kaleel Mahmood, Yuan Hong, Derek Aguiar
TL;DR
The paper tackles the high cost of obtaining certified robustness for large classifiers by enabling certification on pre-trained, potentially uncertified models. It introduces the Certifying Adapter Framework (CAF), which attaches trainable certifying adapters to a frozen pre-trained feature extractor and uses a smoothed adaptive classifier to obtain robustness guarantees via randomized smoothing. CAF demonstrates strong empirical gains on CIFAR-10 and competitive results on ImageNet, significantly improving certified accuracy across radii and enabling multi-scale defense through ensemble adapters. The work suggests that leveraging pre-trained representations with lightweight adapters can make certified adversarial robustness more scalable and practical in real-world settings.
Abstract
Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage high-performance pre-trained neural networks. In this work, we introduce a novel certifying adapters framework (CAF) that enables and enhances the certification of classifier adversarial robustness. Our approach makes few assumptions about the underlying training algorithm or feature extractor and is thus broadly applicable to different feature extractor architectures (e.g., convolutional neural networks or vision transformers) and smoothing algorithms. We show that CAF (a) enables certification in uncertified models pre-trained on clean datasets and (b) substantially improves the performance of certified classifiers via randomized smoothing and SmoothAdv at multiple radii in CIFAR-10 and ImageNet. We demonstrate that CAF achieves improved certified accuracies when compared to methods based on random or denoised smoothing, and that CAF is insensitive to certifying adapter hyperparameters. Finally, we show that an ensemble of adapters enables a single pre-trained feature extractor to defend against a range of noise perturbation scales.
