Progressive Test Time Energy Adaptation for Medical Image Segmentation
Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park, Daniel H. Pak, Anne-Marie Rickmann, Lawrence H. Staib, James S. Duncan, Alex Wong
TL;DR
We address robust medical image segmentation under covariate shift by introducing a progressive test-time energy adaptation framework. A region-based shape energy model is trained on source data to assign patch-level energies, enabling a frozen energy discriminator to guide iterative refinement of a pretrained segmentation network during test time. Negative examples are synthesized via adversarial perturbations to train the energy model, and adaptation minimizes energy toward a low-energy reference, yielding improved Dice scores and reduced boundary errors across eight datasets and multiple backbones. The approach is model-agnostic, efficiently localizes updates to plausible anatomical regions, and provides out-of-distribution detection during test time, with strong practical implications for real-time clinical deployment.
Abstract
We propose a model-agnostic, progressive test-time energy adaptation approach for medical image segmentation. Maintaining model performance across diverse medical datasets is challenging, as distribution shifts arise from inconsistent imaging protocols and patient variations. Unlike domain adaptation methods that require multiple passes through target data - impractical in clinical settings - our approach adapts pretrained models progressively as they process test data. Our method leverages a shape energy model trained on source data, which assigns an energy score at the patch level to segmentation maps: low energy represents in-distribution (accurate) shapes, while high energy signals out-of-distribution (erroneous) predictions. By minimizing this energy score at test time, we refine the segmentation model to align with the target distribution. To validate the effectiveness and adaptability, we evaluated our framework on eight public MRI (bSSFP, T1- and T2-weighted) and X-ray datasets spanning cardiac, spinal cord, and lung segmentation. We consistently outperform baselines both quantitatively and qualitatively.
