S$^3$-TTA: Scale-Style Selection for Test-Time Augmentation in Biomedical Image Segmentation
Kangxian Xie, Siyu Huang, Sebastian Andres Cajas Ordonez, Hanspeter Pfister, Donglai Wei
TL;DR
S$^3$-TTA addresses domain generalization in biomedical image segmentation by selecting a single, optimal test-time augmentation per image through a transformation consistency metric. It combines scale and style augmentations with AdaIN-based style transfer in an end-to-end augmentation-segmentation pipeline, and uses a rotational self-consistency (MAE) to choose the most reliable augmented view for segmentation. The approach yields state-of-the-art improvements on cell nuclei and chest X-ray lung segmentation benchmarks, with reported gains of about 3.4% and 1.3% over prior methods, respectively. This plug-in TTA strategy improves robustness to domain shifts while remaining compatible with existing segmentation models and tasks.
Abstract
Deep-learning models have been successful in biomedical image segmentation. To generalize for real-world deployment, test-time augmentation (TTA) methods are often used to transform the test image into different versions that are hopefully closer to the training domain. Unfortunately, due to the vast diversity of instance scale and image styles, many augmented test images produce undesirable results, thus lowering the overall performance. This work proposes a new TTA framework, S$^3$-TTA, which selects the suitable image scale and style for each test image based on a transformation consistency metric. In addition, S$^3$-TTA constructs an end-to-end augmentation-segmentation joint-training pipeline to ensure a task-oriented augmentation. On public benchmarks for cell and lung segmentation, S$^3$-TTA demonstrates improvements over the prior art by 3.4% and 1.3%, respectively, by simply augmenting the input data in testing phase.
