Intelligent Multi-View Test Time Augmentation
Efe Ozturk, Mohit Prabhushankar, Ghassan AlRegib
TL;DR
This paper tackles the susceptibility of image classifiers to viewpoint-induced variability by introducing an uncertainty-guided Test Time Augmentation framework. The method, built as a two-stage process, first identifies a per-class optimal augmentation view using predictive uncertainty and then gates the application of TTA with a threshold $\tau$, combining $P_{default}$ and $P_{aug}$ when invoked. A class-wise augmentation preference vector $\mathbf{v}$ is learned via a matrix $\mathbf{S}$ that tallies the best augmentation per class, while Stage-2 leverages $P_{default}$, $P_{aug}$, and $P_{final}=\frac{P_{default}+P_{aug}}{2}$ to produce final predictions. Empirical results across CURE-OR, FacePix, and Brain Tumor MRI datasets using ResNet50, VGG-16, and ViT models show a mean gain of $1.73\%$ over single-view baselines and substantial improvements over random augmentation, with notable gains such as $4.4\%$–$7.26\%$ on select configurations. The work demonstrates that uncertainty-guided, class-aware TTA can enhance robustness and efficiency in real-world image classification tasks and motivates further exploration of intelligent augmentation strategies.
Abstract
In this study, we introduce an intelligent Test Time Augmentation (TTA) algorithm designed to enhance the robustness and accuracy of image classification models against viewpoint variations. Unlike traditional TTA methods that indiscriminately apply augmentations, our approach intelligently selects optimal augmentations based on predictive uncertainty metrics. This selection is achieved via a two-stage process: the first stage identifies the optimal augmentation for each class by evaluating uncertainty levels, while the second stage implements an uncertainty threshold to determine when applying TTA would be advantageous. This methodological advancement ensures that augmentations contribute to classification more effectively than a uniform application across the dataset. Experimental validation across several datasets and neural network architectures validates our approach, yielding an average accuracy improvement of 1.73% over methods that use single-view images. This research underscores the potential of adaptive, uncertainty-aware TTA in improving the robustness of image classification in the presence of viewpoint variations, paving the way for further exploration into intelligent augmentation strategies.
