Classification-Denoising Networks

Louis Thiry; Florentin Guth

Classification-Denoising Networks

Louis Thiry, Florentin Guth

TL;DR

This model shows an increased robustness to adversarial perturbations compared to a standard discriminative classifier, and allows for a novel interpretation of adversarial gradients as a difference of denoisers.

Abstract

Image classification and denoising suffer from complementary issues of lack of robustness or partially ignoring conditioning information. We argue that they can be alleviated by unifying both tasks through a model of the joint probability of (noisy) images and class labels. Classification is performed with a forward pass followed by conditioning. Using the Tweedie-Miyasawa formula, we evaluate the denoising function with the score, which can be computed by marginalization and back-propagation. The training objective is then a combination of cross-entropy loss and denoising score matching loss integrated over noise levels. Numerical experiments on CIFAR-10 and ImageNet show competitive classification and denoising performance compared to reference deep convolutional classifiers/denoisers, and significantly improves efficiency compared to previous joint approaches. Our model shows an increased robustness to adversarial perturbations compared to a standard discriminative classifier, and allows for a novel interpretation of adversarial gradients as a difference of denoisers.

Classification-Denoising Networks

TL;DR

Abstract

Paper Structure (32 sections, 25 equations, 5 figures, 3 tables)

This paper contains 32 sections, 25 equations, 5 figures, 3 tables.

Introduction: classification and denoising
Joint energy-score models
Advantages of joint over conditional modeling
Joint modeling with diffusion models
Diffusion models and denoising.
Adversarial gradients.
Likelihood evaluation.
Architecture and training
Parameterization of joint log-probability
GradResNet architecture
Training
Training objectives.
Training distributions.
Numerical results
Classification and denoising via joint modeling
...and 17 more sections

Figures (5)

Figure 1: Illustration of the generalization bounds $b + \frac{v}{n}$ in two idealized settings with stylized values for $b$ and $v$. Left: bias-dominated setting with $b^{\mathrm{gen}} = 5$, $b^{\mathrm{dis}} = 1$, $v^{\mathrm{gen}} = 20$, $v^{\mathrm{dis}} = 100$. Right: variance-dominated setting with $b^{\mathrm{gen}} = b^{\mathrm{dis}} = 1$, $v^{\mathrm{gen}} = 100$, $v^{\mathrm{dis}}=10000$.
Figure 2: Left: ResNet BasicBlock with bias parameters, batch-normalization (BN) layers and ReLUs. Middle: GradResNet BasicBlock with bias-free convolutional layers, GELUs, and a single group-normalization (GN) layer. Right: Illustration of the proposed side connections.
Figure 3: Denoising experiment. Top, left-to-right: Original CIFAR-10 test image, noisy image ($\sigma=50$), denoised images with unconditional and conditional denoisers, and difference between them (magnified $500$x). Bottom, left-to-right: Eigenvectors corresponding to the three largest ($2.71$,$2.16$, $2.03$) and two lowest ($2.8 \times {10}^{-5}$,$-1.9 \times {10}^{-5}$) magnitude eigenvalues of the unconditional denoiser Jacobian. More examples are shown in \ref{['app:denoising']}.
Figure 4: Adversarial attacks on CIFAR-10 test set. The baseline is a ResNet18 trained for classification only, ours is a GradResNet trained for classification and denoising, and JEM is the method proposed by grathwohl2019your. Left:$\ell^{\infty}$ PGD attack. Right:$\ell^{2}$ PGD attack.
Figure 5: Additional denoising experiments with $\sigma=50$. See \ref{['fig:denoising']} for details.

Classification-Denoising Networks

TL;DR

Abstract

Classification-Denoising Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)