Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

Kun Fang; Qinghua Tao; Yingwen Wu; Tao Li; Xiaolin Huang; Jie Yang

Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

Kun Fang, Qinghua Tao, Yingwen Wu, Tao Li, Xiaolin Huang, Jie Yang

TL;DR

The paper addresses certifiable robustness under Gaussian perturbations by advancing Randomized Smoothing (RS). It introduces SmOothed Multi-head Ensemble (SOME), a single DNN augmented with multiple heads that are diversified via a cosine constraint and trained through a circular-teaching self-paced strategy to leverage ensemble benefits with reduced computation. Extensive experiments on CIFAR10 and ImageNet show that SOME achieves state-of-the-art or competitive certified robustness while significantly cutting certification and training costs compared with ensembles of separate DNNs. Ablation studies validate the contributions of the multi-head design, circular-teaching, and thresholding choices, and a comparison with the DRT method highlights SOME’s favorable efficiency and practicality for RS-based defenses.

Abstract

Randomized Smoothing (RS) is a promising technique for certified robustness, and recently in RS the ensemble of multiple Deep Neural Networks (DNNs) has shown state-of-the-art performances due to its variance reduction effect over Gaussian noises. However, such an ensemble brings heavy computation burdens in both training and certification, and yet under-exploits individual DNNs and their mutual effects, as the communication between these classifiers is commonly ignored in optimization. In this work, we consider a novel ensemble-based training way for a single DNN with multiple augmented heads, named as SmOothed Multi-head Ensemble (SOME). In SOME, similar to the pursuit of variance reduction via ensemble, an ensemble of multiple heads imposed with a cosine constraint inside a single DNN is employed with much cheaper training and certification computation overloads in RS. In such network structure, an associated training strategy is designed by introducing a circular communication flow among those augmented heads. That is, each head teaches its neighbor with the self-paced learning strategy using smoothed losses, which are specifically designed in relation to certified robustness. The deployed multi-head structure and the circular-teaching scheme in SOME jointly contribute to the diversities among multiple heads and benefit their ensemble, leading to a competitively stronger certifiably-robust RS-based defense than ensembling multiple DNNs (effectiveness) at the cost of much less computational expenses (efficiency), verified by extensive experiments and discussions.

Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

TL;DR

Abstract

Paper Structure (28 sections, 11 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 11 equations, 7 figures, 9 tables, 1 algorithm.

Introduction
Related work
Randomized smoothing
Co-teaching and self-paced learning
Methodology
Augmented multi-head network
Optimization framework
Numerical experiments
Stronger certified robustness from SOME
More efficient computation from SOME
Ablation studies on the essentials of SOME
On the cosine constraint in SOME
On the multi-head structure and the circular-teaching
On the different head locations
On the number of heads in SOME
...and 13 more sections

Figures (7)

Figure 1: An illustration on the proposed SOME of a 5-head DNN. In our efficient structure for the ensemble, a common backbone $g$ is shared by the augmented 5 heads $h^1, \cdots, h^5$, which are trained to be mutually orthogonal to each other for diversified classifiers. In the novel training with a circular communication flow between classifiers, the augmented head $h^{k+1}$ are optimized via the circular-teaching by its peer classifier $h^k$ with easy samples $D_e$ and hard ones $D_h$ in relation to the certified radii ($h^0\coloneqq h^5$). The referred figures on the illustration of easy and hard samples are from horvath2021boosting, and refer to \ref{['sec:ct']} for details on how to select easy and hard samples ($D_e$ and $D_h$) under this certified robustness task.
Figure 2: Comparisons of log-probability gap distributions of a randomly chosen test sample of CIFAR10, where 10,000 samplings are executed. In each histogram, the area in the left-side of the dashed line indicates misclassifications on noisy samples.
Figure 3: Ablation studies on where to augment heads.
Figure 4: The performance of SOME under varied numbers of heads. Experiments are executed based on ResNet-110 on CIFAR10 with a noise level $\sigma=0.25$.
Figure 5: Ablation studies on the thresholding scheme of SOME. (a): Ablations on the soft and hard thresholding in the self-paced learning of SOME. (b): Ablations on the larger weighting on the high-loss end. (c): Ablations on the circular-teaching scheme of SOME. All the SOME models are of ResNet110 and are trained with the base method Gaussian cohen2019certified on CIFAR10 under noise levels $\sigma\in\{0.25,0.50,1.00\}$.
...and 2 more figures

Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

TL;DR

Abstract

Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (7)