Understanding Test-Time Augmentation
Masanari Kimura
TL;DR
This work analyzes Test-Time Augmentation (TTA) from a theoretical standpoint, modeling TTA as averaging predictions over test-time transformations to improve generalization. It formulates TTA via an augmented input space, derives upper bounds showing $\mathcal{R}^{\ell,\mathcal{G}}(h) \leq \bar{\mathcal{R}}^\ell(h)$, and introduces a weighted TTA with a closed-form solution for optimal weights $w_i = \frac{\sum_j \Gamma^{-1}_{ij}}{\sum_{k,l} \Gamma^{-1}_{kl}}$, linking performance to an ambiguity term. Under standard regularity conditions, the paper proves statistical consistency of ERM with data augmentation and discusses conditions for strict improvement when augmentation-induced errors are uncorrelated. The key insight is that TTA benefits arise from accurate yet diverse augmented views, with redundancy minimized when augmentations become highly correlated; these results guide when and how to apply TTA in practice and point to avenues for more generalization-bound analyses.
Abstract
Test-Time Augmentation (TTA) is a very powerful heuristic that takes advantage of data augmentation during testing to produce averaged output. Despite the experimental effectiveness of TTA, there is insufficient discussion of its theoretical aspects. In this paper, we aim to give theoretical guarantees for TTA and clarify its behavior.
