Test-Time Augmentation Meets Variational Bayes

Masanari Kimura; Howard Bondell

Test-Time Augmentation Meets Variational Bayes

Masanari Kimura, Howard Bondell

TL;DR

A weighted version of the Test-Time Augmentation can be formalized in a variational Bayesian framework based on the contribution of each data augmentation and it is demonstrated that optimizing the weights to maximize the marginal log-likelihood suppresses candidates of unwanted data augmentations at the test phase.

Abstract

Data augmentation is known to contribute significantly to the robustness of machine learning models. In most instances, data augmentation is utilized during the training phase. Test-Time Augmentation (TTA) is a technique that instead leverages these data augmentations during the testing phase to achieve robust predictions. More precisely, TTA averages the predictions of multiple data augmentations of an instance to produce a final prediction. Although the effectiveness of TTA has been empirically reported, it can be expected that the predictive performance achieved will depend on the set of data augmentation methods used during testing. In particular, the data augmentation methods applied should make different contributions to performance. That is, it is anticipated that there may be differing degrees of contribution in the set of data augmentation methods used for TTA, and these could have a negative impact on prediction performance. In this study, we consider a weighted version of the TTA based on the contribution of each data augmentation. Some variants of TTA can be regarded as considering the problem of determining the appropriate weighting. We demonstrate that the determination of the coefficients of this weighted TTA can be formalized in a variational Bayesian framework. We also show that optimizing the weights to maximize the marginal log-likelihood suppresses candidates of unwanted data augmentations at the test phase.

Test-Time Augmentation Meets Variational Bayes

TL;DR

Abstract

Paper Structure (14 sections, 39 equations, 7 figures, 4 tables)

This paper contains 14 sections, 39 equations, 7 figures, 4 tables.

Introduction
Background and Preliminary
Data Augmentation
Test-Time Augmentation
Test-Time Augmentation under Noisy Environments
Why is Determination of TTA Weight Coefficients Difficult?
Test-Time Augmentation as Bayesian Mixture Model
Continuous Case
Categorical Case
Automatic Differentiation Variational Inference for VB-TTA
Numerical Experiments
Illustrative Examples
Experimental Results on Real Datasets
Conclusion and Discussion

Figures (7)

Figure 1: Test-Time Augmentation as Bayesian mixture model. Assuming that the transformed instances acquired by each data augmentation follow some probability distribution, the TTA procedure can be regarded as sampling from their mixture models.
Figure 2: Some example instances in the CIFAR10-N wei2021learning dataset. Each instance in this dataset has three human annotations, which are often inconsistent.
Figure 3: Plots of the distributions of points induced by mixup and cutmix (Gaussian distribution case). The black dots represent the input $\bm{x}$, and the figure shows the distributions induced by $\psi_M(\bm{x})$ and $\psi_C(\bm{x})$ when those $\bm{x}$ are fixed.
Figure 4: Plots of the distributions of points induced by mixup and cutmix (Gamma distribution case). The black dots represent the input $\bm{x}$, and the figure shows the distributions induced by $\psi_M(\bm{x})$ and $\psi_C(\bm{x})$ when those $\bm{x}$ are fixed.
Figure 5: Optimization of VB-TTA. The first row shows the history of the optimization of the weight coefficients. The second row shows the evolution of the weight coefficients assigned to each data augmentation during the optimization process.
...and 2 more figures

Test-Time Augmentation Meets Variational Bayes

TL;DR

Abstract

Test-Time Augmentation Meets Variational Bayes

Authors

TL;DR

Abstract

Table of Contents

Figures (7)