Table of Contents
Fetching ...

Test-time generative augmentation for medical image segmentation

Xiao Ma, Yuhui Tao, Zetian Zhang, Yuhan Zhang, Xi Wang, Sheng Zhang, Zexuan Ji, Yizhe Zhang, Qiang Chen, Guang Yang

TL;DR

This work tackles uncertainty and robustness in medical image segmentation during inference due to occlusions, boundary ambiguity, and cross-device variations. It proposes TTGA, a test-time generative augmentation framework that uses a domain-adapted diffusion model and masked null-text inversion to create region-specific augmentations conditioned on semantic context and image identity. It integrates dual denoising paths with region masks and multi-condition guidance to balance content preservation with meaningful variability, validated across three tasks and multiple datasets, showing consistent segmentation accuracy gains and improved pixel-wise uncertainty estimation. The approach offers a practical, scalable tool for more reliable clinical image analysis under domain shift and data variability.

Abstract

Medical image segmentation is critical for clinical diagnosis, treatment planning, and monitoring, yet segmentation models often struggle with uncertainties stemming from occlusions, ambiguous boundaries, and variations in imaging devices. Traditional test-time augmentation (TTA) techniques typically rely on predefined geometric and photometric transformations, limiting their adaptability and effectiveness in complex medical scenarios. In this study, we introduced Test-Time Generative Augmentation (TTGA), a novel augmentation strategy specifically tailored for medical image segmentation at inference time. Different from conventional augmentation strategies that suffer from excessive randomness or limited flexibility, TTGA leverages a domain-fine-tuned generative model to produce contextually relevant and diverse augmentations tailored to the characteristics of each test image. Built upon diffusion model inversion, a masked null-text inversion method is proposed to enable region-specific augmentations during sampling. Furthermore, a dual denoising pathway is designed to balance precise identity preservation with controlled variability. We demonstrate the efficacy of our TTGA through extensive experiments across three distinct segmentation tasks spanning nine datasets. Our results consistently demonstrate that TTGA not only improves segmentation accuracy (with DSC gains ranging from 0.1% to 2.3% over the baseline) but also offers pixel-wise error estimation (with DSC gains ranging from 1.1% to 29.0% over the baseline). The source code and demonstration are available at: https://github.com/maxiao0234/TTGA.

Test-time generative augmentation for medical image segmentation

TL;DR

This work tackles uncertainty and robustness in medical image segmentation during inference due to occlusions, boundary ambiguity, and cross-device variations. It proposes TTGA, a test-time generative augmentation framework that uses a domain-adapted diffusion model and masked null-text inversion to create region-specific augmentations conditioned on semantic context and image identity. It integrates dual denoising paths with region masks and multi-condition guidance to balance content preservation with meaningful variability, validated across three tasks and multiple datasets, showing consistent segmentation accuracy gains and improved pixel-wise uncertainty estimation. The approach offers a practical, scalable tool for more reliable clinical image analysis under domain shift and data variability.

Abstract

Medical image segmentation is critical for clinical diagnosis, treatment planning, and monitoring, yet segmentation models often struggle with uncertainties stemming from occlusions, ambiguous boundaries, and variations in imaging devices. Traditional test-time augmentation (TTA) techniques typically rely on predefined geometric and photometric transformations, limiting their adaptability and effectiveness in complex medical scenarios. In this study, we introduced Test-Time Generative Augmentation (TTGA), a novel augmentation strategy specifically tailored for medical image segmentation at inference time. Different from conventional augmentation strategies that suffer from excessive randomness or limited flexibility, TTGA leverages a domain-fine-tuned generative model to produce contextually relevant and diverse augmentations tailored to the characteristics of each test image. Built upon diffusion model inversion, a masked null-text inversion method is proposed to enable region-specific augmentations during sampling. Furthermore, a dual denoising pathway is designed to balance precise identity preservation with controlled variability. We demonstrate the efficacy of our TTGA through extensive experiments across three distinct segmentation tasks spanning nine datasets. Our results consistently demonstrate that TTGA not only improves segmentation accuracy (with DSC gains ranging from 0.1% to 2.3% over the baseline) but also offers pixel-wise error estimation (with DSC gains ranging from 1.1% to 29.0% over the baseline). The source code and demonstration are available at: https://github.com/maxiao0234/TTGA.

Paper Structure

This paper contains 36 sections, 21 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Visualization of TTGA augmentation results on three exemplary images. The original images present challenges for segmentation due to tissue overlap, blurred boundaries, and diverse acquisition conditions. TTGA-augmented images introduce variations in local structure, sharpness, and imaging style, which enhance segmentation accuracy and robustness, while also supporting uncertainty estimation and model reliability. The color bars indicate the scales for segmentation probability and error estimation, respectively
  • Figure 2: The proposed pipeline of three key workflows are presented. The test image is processed through a sequence of steps to generate a noise image at a designated step count. Using this noise image, a one-step denoising process is employed to refine a trainable null-text embedding, enabling the stable generation of results that closely resemble the initial image. In the augmentation generation phase, this null-text embedding, guided by semantic and regional information, is leveraged to produce a series of augmented images.
  • Figure 3: Results of TTGA on a fundus image under different guidance scales. (a) The original, unaugmented image. (b) The corresponding ground truth segmentation of the optic disc and cup. (c) Visualization of augmented images generated using various combinations of identity guidance scales and semantic guidance scales. (d) Segmentation results produced by the model on the augmented images.
  • Figure 4: Qualitative comparison of segmentation results and Error estimation. This figure provides a visual comparison of TTGA (Ours) against baseline models and other test-time methods across three tasks: (a) Fundus, (b) Polyp, and (c) Skin. The "Error" (target highlighted in orange) represents the ground truth for uncertainty which is generated by visualizing the pixel-wise difference between the binarized baseline segmentation (using a 0.5 threshold on the 0-1 normalized output) and the "Ground Truth" (also highlighted in orange). The objective for the "Segmentation" column is to match the "Ground Truth," while the objective for the "Error Estimation" column is to match the "Error." The color bars indicate the scales for segmentation probability and error estimation, respectively
  • Figure 5: Visualization of average views for two representative samples. For each identity guidance scale, augmented images exhibit local variations in detail. The average view of multiple augmentations closely resembles the original image, indicating that the augmentations are centered around the original image with minimal bias.
  • ...and 1 more figures