Table of Contents
Fetching ...

Progressive Test Time Energy Adaptation for Medical Image Segmentation

Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park, Daniel H. Pak, Anne-Marie Rickmann, Lawrence H. Staib, James S. Duncan, Alex Wong

TL;DR

We address robust medical image segmentation under covariate shift by introducing a progressive test-time energy adaptation framework. A region-based shape energy model is trained on source data to assign patch-level energies, enabling a frozen energy discriminator to guide iterative refinement of a pretrained segmentation network during test time. Negative examples are synthesized via adversarial perturbations to train the energy model, and adaptation minimizes energy toward a low-energy reference, yielding improved Dice scores and reduced boundary errors across eight datasets and multiple backbones. The approach is model-agnostic, efficiently localizes updates to plausible anatomical regions, and provides out-of-distribution detection during test time, with strong practical implications for real-time clinical deployment.

Abstract

We propose a model-agnostic, progressive test-time energy adaptation approach for medical image segmentation. Maintaining model performance across diverse medical datasets is challenging, as distribution shifts arise from inconsistent imaging protocols and patient variations. Unlike domain adaptation methods that require multiple passes through target data - impractical in clinical settings - our approach adapts pretrained models progressively as they process test data. Our method leverages a shape energy model trained on source data, which assigns an energy score at the patch level to segmentation maps: low energy represents in-distribution (accurate) shapes, while high energy signals out-of-distribution (erroneous) predictions. By minimizing this energy score at test time, we refine the segmentation model to align with the target distribution. To validate the effectiveness and adaptability, we evaluated our framework on eight public MRI (bSSFP, T1- and T2-weighted) and X-ray datasets spanning cardiac, spinal cord, and lung segmentation. We consistently outperform baselines both quantitatively and qualitatively.

Progressive Test Time Energy Adaptation for Medical Image Segmentation

TL;DR

We address robust medical image segmentation under covariate shift by introducing a progressive test-time energy adaptation framework. A region-based shape energy model is trained on source data to assign patch-level energies, enabling a frozen energy discriminator to guide iterative refinement of a pretrained segmentation network during test time. Negative examples are synthesized via adversarial perturbations to train the energy model, and adaptation minimizes energy toward a low-energy reference, yielding improved Dice scores and reduced boundary errors across eight datasets and multiple backbones. The approach is model-agnostic, efficiently localizes updates to plausible anatomical regions, and provides out-of-distribution detection during test time, with strong practical implications for real-time clinical deployment.

Abstract

We propose a model-agnostic, progressive test-time energy adaptation approach for medical image segmentation. Maintaining model performance across diverse medical datasets is challenging, as distribution shifts arise from inconsistent imaging protocols and patient variations. Unlike domain adaptation methods that require multiple passes through target data - impractical in clinical settings - our approach adapts pretrained models progressively as they process test data. Our method leverages a shape energy model trained on source data, which assigns an energy score at the patch level to segmentation maps: low energy represents in-distribution (accurate) shapes, while high energy signals out-of-distribution (erroneous) predictions. By minimizing this energy score at test time, we refine the segmentation model to align with the target distribution. To validate the effectiveness and adaptability, we evaluated our framework on eight public MRI (bSSFP, T1- and T2-weighted) and X-ray datasets spanning cardiac, spinal cord, and lung segmentation. We consistently outperform baselines both quantitatively and qualitatively.

Paper Structure

This paper contains 27 sections, 10 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of a single-step update in our proposed progressive test-time adaptation framework. The energy model identifies erroneous regions in the initial segmentation prediction by assigning high energy values to these areas. Subsequently, we update the segmentation network to minimize the assigned energy, refining the prediction and improving segmentation accuracy.
  • Figure 2: Overview. We assume a segmentation model $f_\theta(\cdot)$ is pretrained on a source dataset. (a) The energy model $g_\phi(\cdot)$ is trained to estimate patchwise energy values, using binary reference energy labels based on the mismatch between perturbed predictions $\hat{Y}_s$ and ground truth shape $Y_s$ on the source dataset. (b) During adaptation, the trained energy model $g_\phi(\cdot)$ is applied to predictions on the test-time distribution, and the BatchNorm layers of $f_\theta(\cdot)$ are updated iteratively to match with uniform low energy as target.
  • Figure 3: Qualitative comparison. Rows 1–2 show cardiac segmentation, rows 3–4 spinal cord, and rows 5–6 lung segmentation. Our method refines incomplete initial predictions, producing more plausible shapes after adaptation.
  • Figure 4: Qualitative evaluation of energy-guided adaptation across iterations. The estimated energy (lower values preferred) in the second row effectively highlights undesired shape regions in the first row. Our approach progressively adapts the source model, refining and completing the initial segmentation in test time.
  • Figure 5: Qualitative comparison of curated perturbations (top row) and real out-of-distribution (OOD) segmentation errors (bottom row) at test time. The visual similarity between curated and real OOD examples validates the effectiveness of curated perturbations in modeling realistic segmentation errors.
  • ...and 2 more figures