Table of Contents
Fetching ...

BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging

Zeinab Sherkatghanad, Moloud Abdar, Mohammadreza Bakhtyari, Pawel Plawiak, Vladimir Makarenkov

TL;DR

BayTTA presents a Bayesian Model Averaging–based framework to optimize test-time augmentation for uncertainty-aware classification in medical imaging and gene editing. By treating TTA-derived predictions as predictor variables and weighting candidate model combinations with posterior probabilities from BIC-based logistic regression, BayTTA yields a robust, calibrated final prediction $y_{BMA}$ and an uncertainty measure $\sigma_{BayTTA}$. Across three medical-image datasets and two CRISPR datasets, BayTTA consistently improves accuracy and reduces prediction variance compared with standard TTA and baseline models, while remaining compatible with multiple pretrained CNN backbones and state-of-the-art classifiers. The approach also analyzes the impact of augmentation types and the number of test-time samples, demonstrating practical gains balanced against computational cost. Overall, BayTTA offers a principled, uncertainty-aware enhancement to TTA with broad applicability in critical domains such as medical diagnostics and gene-editing off-target prediction.

Abstract

Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating multiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a prediction list associated with different variations of the input data created through TTA. Then, we use BMA to combine predictions weighted by the respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance. The source code of the proposed BayTTA method is freely available at: \underline {https://github.com/Z-Sherkat/BayTTA}.

BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging

TL;DR

BayTTA presents a Bayesian Model Averaging–based framework to optimize test-time augmentation for uncertainty-aware classification in medical imaging and gene editing. By treating TTA-derived predictions as predictor variables and weighting candidate model combinations with posterior probabilities from BIC-based logistic regression, BayTTA yields a robust, calibrated final prediction and an uncertainty measure . Across three medical-image datasets and two CRISPR datasets, BayTTA consistently improves accuracy and reduces prediction variance compared with standard TTA and baseline models, while remaining compatible with multiple pretrained CNN backbones and state-of-the-art classifiers. The approach also analyzes the impact of augmentation types and the number of test-time samples, demonstrating practical gains balanced against computational cost. Overall, BayTTA offers a principled, uncertainty-aware enhancement to TTA with broad applicability in critical domains such as medical diagnostics and gene-editing off-target prediction.

Abstract

Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating multiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a prediction list associated with different variations of the input data created through TTA. Then, we use BMA to combine predictions weighted by the respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance. The source code of the proposed BayTTA method is freely available at: \underline {https://github.com/Z-Sherkat/BayTTA}.

Paper Structure

This paper contains 18 sections, 8 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Examples of original benign and malignant skin cancer, breast cancer, and chest X-ray images and their augmented versions considered in our study.
  • Figure 2: A schematic view of a conventional test-time augmentation (TTA) process.
  • Figure 3: An overview of the proposed BayTTA method. During the testing phase: (a) TTA generates predictions from a set of fixed augmented images, and (b) BMA is then applied to combine and aggregate these predictions by treating each unique combination as a distinct candidate model.
  • Figure 4: Examples of visualizing CRISPR-Cas9 sgRNA-DNA sequence pairs encoded onto 8×23 matrices, then transformed into black and white images from the (a) CRISPOR and (b) GUIDE-seq gene editing datasets, respectively charlier2021accurate. These images can be processed by neural networks to predict off-targets generated by CRISPR-Cas9 technology.
  • Figure 5: Comparison of the TTA and BayTTA method performance on the skin cancer dataset in terms of accuracy and standard deviation, while considering pre-trained baseline models with rotate, zoom, and shift augmentations.
  • ...and 3 more figures