BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging
Zeinab Sherkatghanad, Moloud Abdar, Mohammadreza Bakhtyari, Pawel Plawiak, Vladimir Makarenkov
TL;DR
BayTTA presents a Bayesian Model Averaging–based framework to optimize test-time augmentation for uncertainty-aware classification in medical imaging and gene editing. By treating TTA-derived predictions as predictor variables and weighting candidate model combinations with posterior probabilities from BIC-based logistic regression, BayTTA yields a robust, calibrated final prediction $y_{BMA}$ and an uncertainty measure $\sigma_{BayTTA}$. Across three medical-image datasets and two CRISPR datasets, BayTTA consistently improves accuracy and reduces prediction variance compared with standard TTA and baseline models, while remaining compatible with multiple pretrained CNN backbones and state-of-the-art classifiers. The approach also analyzes the impact of augmentation types and the number of test-time samples, demonstrating practical gains balanced against computational cost. Overall, BayTTA offers a principled, uncertainty-aware enhancement to TTA with broad applicability in critical domains such as medical diagnostics and gene-editing off-target prediction.
Abstract
Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating multiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a prediction list associated with different variations of the input data created through TTA. Then, we use BMA to combine predictions weighted by the respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance. The source code of the proposed BayTTA method is freely available at: \underline {https://github.com/Z-Sherkat/BayTTA}.
