Table of Contents
Fetching ...

Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

Kun Fang, Qinghua Tao, Yingwen Wu, Tao Li, Xiaolin Huang, Jie Yang

TL;DR

The paper addresses certifiable robustness under Gaussian perturbations by advancing Randomized Smoothing (RS). It introduces SmOothed Multi-head Ensemble (SOME), a single DNN augmented with multiple heads that are diversified via a cosine constraint and trained through a circular-teaching self-paced strategy to leverage ensemble benefits with reduced computation. Extensive experiments on CIFAR10 and ImageNet show that SOME achieves state-of-the-art or competitive certified robustness while significantly cutting certification and training costs compared with ensembles of separate DNNs. Ablation studies validate the contributions of the multi-head design, circular-teaching, and thresholding choices, and a comparison with the DRT method highlights SOME’s favorable efficiency and practicality for RS-based defenses.

Abstract

Randomized Smoothing (RS) is a promising technique for certified robustness, and recently in RS the ensemble of multiple Deep Neural Networks (DNNs) has shown state-of-the-art performances due to its variance reduction effect over Gaussian noises. However, such an ensemble brings heavy computation burdens in both training and certification, and yet under-exploits individual DNNs and their mutual effects, as the communication between these classifiers is commonly ignored in optimization. In this work, we consider a novel ensemble-based training way for a single DNN with multiple augmented heads, named as SmOothed Multi-head Ensemble (SOME). In SOME, similar to the pursuit of variance reduction via ensemble, an ensemble of multiple heads imposed with a cosine constraint inside a single DNN is employed with much cheaper training and certification computation overloads in RS. In such network structure, an associated training strategy is designed by introducing a circular communication flow among those augmented heads. That is, each head teaches its neighbor with the self-paced learning strategy using smoothed losses, which are specifically designed in relation to certified robustness. The deployed multi-head structure and the circular-teaching scheme in SOME jointly contribute to the diversities among multiple heads and benefit their ensemble, leading to a competitively stronger certifiably-robust RS-based defense than ensembling multiple DNNs (effectiveness) at the cost of much less computational expenses (efficiency), verified by extensive experiments and discussions.

Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

TL;DR

The paper addresses certifiable robustness under Gaussian perturbations by advancing Randomized Smoothing (RS). It introduces SmOothed Multi-head Ensemble (SOME), a single DNN augmented with multiple heads that are diversified via a cosine constraint and trained through a circular-teaching self-paced strategy to leverage ensemble benefits with reduced computation. Extensive experiments on CIFAR10 and ImageNet show that SOME achieves state-of-the-art or competitive certified robustness while significantly cutting certification and training costs compared with ensembles of separate DNNs. Ablation studies validate the contributions of the multi-head design, circular-teaching, and thresholding choices, and a comparison with the DRT method highlights SOME’s favorable efficiency and practicality for RS-based defenses.

Abstract

Randomized Smoothing (RS) is a promising technique for certified robustness, and recently in RS the ensemble of multiple Deep Neural Networks (DNNs) has shown state-of-the-art performances due to its variance reduction effect over Gaussian noises. However, such an ensemble brings heavy computation burdens in both training and certification, and yet under-exploits individual DNNs and their mutual effects, as the communication between these classifiers is commonly ignored in optimization. In this work, we consider a novel ensemble-based training way for a single DNN with multiple augmented heads, named as SmOothed Multi-head Ensemble (SOME). In SOME, similar to the pursuit of variance reduction via ensemble, an ensemble of multiple heads imposed with a cosine constraint inside a single DNN is employed with much cheaper training and certification computation overloads in RS. In such network structure, an associated training strategy is designed by introducing a circular communication flow among those augmented heads. That is, each head teaches its neighbor with the self-paced learning strategy using smoothed losses, which are specifically designed in relation to certified robustness. The deployed multi-head structure and the circular-teaching scheme in SOME jointly contribute to the diversities among multiple heads and benefit their ensemble, leading to a competitively stronger certifiably-robust RS-based defense than ensembling multiple DNNs (effectiveness) at the cost of much less computational expenses (efficiency), verified by extensive experiments and discussions.
Paper Structure (28 sections, 11 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 11 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: An illustration on the proposed SOME of a 5-head DNN. In our efficient structure for the ensemble, a common backbone $g$ is shared by the augmented 5 heads $h^1, \cdots, h^5$, which are trained to be mutually orthogonal to each other for diversified classifiers. In the novel training with a circular communication flow between classifiers, the augmented head $h^{k+1}$ are optimized via the circular-teaching by its peer classifier $h^k$ with easy samples $D_e$ and hard ones $D_h$ in relation to the certified radii ($h^0\coloneqq h^5$). The referred figures on the illustration of easy and hard samples are from horvath2021boosting, and refer to \ref{['sec:ct']} for details on how to select easy and hard samples ($D_e$ and $D_h$) under this certified robustness task.
  • Figure 2: Comparisons of log-probability gap distributions of a randomly chosen test sample of CIFAR10, where 10,000 samplings are executed. In each histogram, the area in the left-side of the dashed line indicates misclassifications on noisy samples.
  • Figure 3: Ablation studies on where to augment heads.
  • Figure 4: The performance of SOME under varied numbers of heads. Experiments are executed based on ResNet-110 on CIFAR10 with a noise level $\sigma=0.25$.
  • Figure 5: Ablation studies on the thresholding scheme of SOME. (a): Ablations on the soft and hard thresholding in the self-paced learning of SOME. (b): Ablations on the larger weighting on the high-loss end. (c): Ablations on the circular-teaching scheme of SOME. All the SOME models are of ResNet110 and are trained with the base method Gaussian cohen2019certified on CIFAR10 under noise levels $\sigma\in\{0.25,0.50,1.00\}$.
  • ...and 2 more figures