TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

Sandesh Pokhrel; Sanjay Bhandari; Eduard Vazquez; Tryphon Lambrou; Prashnna Gyawali; Binod Bhattarai

TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

Sandesh Pokhrel, Sanjay Bhandari, Eduard Vazquez, Tryphon Lambrou, Prashnna Gyawali, Binod Bhattarai

TL;DR

This work tackles the challenge of detecting abnormalities in gastrointestinal endoscopy by treating anomalies as Out-of-Distribution (OOD) data and introducing test-time augmentation (TTA) to improve ID–OOD separation without retraining models. By generating augmented test samples $x' = T(x)$ and evaluating with various OOD scores, the method induces semantic drift that more clearly distinguishes healthy (ID) from abnormal (OOD) images. Evaluations on the Kvasir-v2 dataset with ResNet-18 and ViT-Small backbones show that TTA consistently enhances OOD detection across multiple scores (e.g., MSP, Odin, Energy, Entropy, MaxLogit, Mahalanobis, ViM), with notable reductions in FPR and improved AUC. Ablation studies reveal that composite augmentations often yield stronger drift than individual ones, while some augmentations like Equalize and Invert can hurt performance. Overall, the approach is model- and score-agnostic, offering a practical boost to GI diagnostic reliability without needing abnormal training data, thereby enhancing real-world deployability.

Abstract

Deep learning has significantly advanced the field of gastrointestinal vision, enhancing disease diagnosis capabilities. One major challenge in automating diagnosis within gastrointestinal settings is the detection of abnormal cases in endoscopic images. Due to the sparsity of data, this process of distinguishing normal from abnormal cases has faced significant challenges, particularly with rare and unseen conditions. To address this issue, we frame abnormality detection as an out-of-distribution (OOD) detection problem. In this setup, a model trained on In-Distribution (ID) data, which represents a healthy GI tract, can accurately identify healthy cases, while abnormalities are detected as OOD, regardless of their class. We introduce a test-time augmentation segment into the OOD detection pipeline, which enhances the distinction between ID and OOD examples, thereby improving the effectiveness of existing OOD methods with the same model. This augmentation shifts the pixel space, which translates into a more distinct semantic representation for OOD examples compared to ID examples. We evaluated our method against existing state-of-the-art OOD scores, showing improvements with test-time augmentation over the baseline approach.

TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

TL;DR

and evaluating with various OOD scores, the method induces semantic drift that more clearly distinguishes healthy (ID) from abnormal (OOD) images. Evaluations on the Kvasir-v2 dataset with ResNet-18 and ViT-Small backbones show that TTA consistently enhances OOD detection across multiple scores (e.g., MSP, Odin, Energy, Entropy, MaxLogit, Mahalanobis, ViM), with notable reductions in FPR and improved AUC. Ablation studies reveal that composite augmentations often yield stronger drift than individual ones, while some augmentations like Equalize and Invert can hurt performance. Overall, the approach is model- and score-agnostic, offering a practical boost to GI diagnostic reliability without needing abnormal training data, thereby enhancing real-world deployability.

Abstract

Paper Structure (6 sections, 2 equations, 2 figures, 5 tables)

This paper contains 6 sections, 2 equations, 2 figures, 5 tables.

Introduction
Method
Experiments
Datasets and Implementation Details
Results
Conclusion

Figures (2)

Figure 1: a) OOD based classification pipeline which separates ID (normal anatomy) and OOD (Disease) data based on feature, logit or gradient information. b) TTA based OOD detection pipeline which works on images with individual or composite augmentations.
Figure 2: a) Qualitative comparison of Maxlogit method for Kvasirv2 on Resnet18: OOD examples over-confidently predicted by the corresponding method as healthy ID data (red) and correctly identified as abnormality (green) under respective test time augmentation technique. b) Performance improvement on Maxlogit method on FPR95$\downarrow$ on separate OOD classes for Kvasirv2 on Resnet18 model.

TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

TL;DR

Abstract

TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

Authors

TL;DR

Abstract

Table of Contents

Figures (2)