Table of Contents
Fetching ...

Dynamic Test-Time Augmentation via Differentiable Functions

Shohei Enomoto, Monikka Roslianna Busto, Takeharu Eda

TL;DR

A novel image enhancement method, DynTTA, which is based on differentiable data augmentation techniques and generates a blended image from many augmented images to improve the recognition accuracy under distribution shifts, and improves the robustness.

Abstract

Distribution shifts, which often occur in the real world, degrade the accuracy of deep learning systems, and thus improving robustness to distribution shifts is essential for practical applications. To improve robustness, we study an image enhancement method that generates recognition-friendly images without retraining the recognition model. We propose a novel image enhancement method, DynTTA, which is based on differentiable data augmentation techniques and generates a blended image from many augmented images to improve the recognition accuracy under distribution shifts. In addition to standard data augmentations, DynTTA also incorporates deep neural network-based image transformation, further improving the robustness. Because DynTTA is composed of differentiable functions, it can be directly trained with the classification loss of the recognition model. In experiments with widely used image recognition datasets using various classification models, DynTTA improves the robustness with almost no reduction in classification accuracy for clean images, thus outperforming the existing methods. Furthermore, the results show that robustness is significantly improved by estimating the training-time augmentations for distribution-shifted datasets using DynTTA and retraining the recognition model with the estimated augmentations. DynTTA is a promising approach for applications that require both clean accuracy and robustness. Our code is available at \url{https://github.com/s-enmt/DynTTA}.

Dynamic Test-Time Augmentation via Differentiable Functions

TL;DR

A novel image enhancement method, DynTTA, which is based on differentiable data augmentation techniques and generates a blended image from many augmented images to improve the recognition accuracy under distribution shifts, and improves the robustness.

Abstract

Distribution shifts, which often occur in the real world, degrade the accuracy of deep learning systems, and thus improving robustness to distribution shifts is essential for practical applications. To improve robustness, we study an image enhancement method that generates recognition-friendly images without retraining the recognition model. We propose a novel image enhancement method, DynTTA, which is based on differentiable data augmentation techniques and generates a blended image from many augmented images to improve the recognition accuracy under distribution shifts. In addition to standard data augmentations, DynTTA also incorporates deep neural network-based image transformation, further improving the robustness. Because DynTTA is composed of differentiable functions, it can be directly trained with the classification loss of the recognition model. In experiments with widely used image recognition datasets using various classification models, DynTTA improves the robustness with almost no reduction in classification accuracy for clean images, thus outperforming the existing methods. Furthermore, the results show that robustness is significantly improved by estimating the training-time augmentations for distribution-shifted datasets using DynTTA and retraining the recognition model with the estimated augmentations. DynTTA is a promising approach for applications that require both clean accuracy and robustness. Our code is available at \url{https://github.com/s-enmt/DynTTA}.
Paper Structure (31 sections, 2 equations, 4 figures, 12 tables, 1 algorithm)

This paper contains 31 sections, 2 equations, 4 figures, 12 tables, 1 algorithm.

Figures (4)

  • Figure 1: DynTTA is used before inference by the classification network to generate recognition-friendly images. First, DynTTA outputs the magnitude parameters and blend weights. Next, predefined data augmentations are performed using the magnitude parameters. Finally, the output image is generated by linearly combining the augmented images with the blend weights. The generated images are input to the classification network. In this paper, these image transformation models that are used before the classification network are referred to as enhancement models.
  • Figure 2: Measuring diverse transformations by mean and standard error of the MSE. ResNet50 was used as the classification model in the blind setting on the CUB dataset.
  • Figure 3: Impact of individual augmentation on classification accuracy. The y-axis is the difference between excluding one augmentation and using all data augmentations. Positive values mean that they contribute to improved accuracy. A-Contrast, LPFs, and HPFs denote auto-contrast, low-pass filters, and high-pass filters, respectively.
  • Figure 4: DynTTA output images. From top to bottom: Speckle Noise, Gaussian Blur, Spatter, and Saturate. The images in each row are, from left to right: input image, augmented images, output image, and difference between input and output images. For ease of viewing, auto-contrast is used for the high-pass filters image. The numbers in the input and output images indicate the loss values.