Table of Contents
Fetching ...

Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for Single Image Test-Time Adaptation

Haoyu Dong, Nicholas Konz, Hanxue Gu, Maciej A. Mazurowski

TL;DR

This work tackles the challenging problem of single-image test-time adaptation for medical image segmentation under domain shift. It introduces InTEnt, a framework that ensembles predictions from multiple adapted models formed by varying batch normalization statistics between source and test domains, and weights them using foreground-background entropy balance (with optional entropy-sharpness weighting). By avoiding online parameter updates and instead integrating over BN-statistic-based models, InTEnt achieves state-of-the-art average Dice scores across 24 domain shifts (71.6% DSC) on three medical imaging datasets, outperforming existing SITTA methods and highlighting the critical role of BN statistics selection. The approach offers a practical, fast, and robust solution for real-world medical imaging where single-image adaptation is often necessary and labeling is scarce.

Abstract

Test-time adaptation (TTA) refers to adapting a trained model to a new domain during testing. Existing TTA techniques rely on having multiple test images from the same domain, yet this may be impractical in real-world applications such as medical imaging, where data acquisition is expensive and imaging conditions vary frequently. Here, we approach such a task, of adapting a medical image segmentation model with only a single unlabeled test image. Most TTA approaches, which directly minimize the entropy of predictions, fail to improve performance significantly in this setting, in which we also observe the choice of batch normalization (BN) layer statistics to be a highly important yet unstable factor due to only having a single test domain example. To overcome this, we propose to instead integrate over predictions made with various estimates of target domain statistics between the training and test statistics, weighted based on their entropy statistics. Our method, validated on 24 source/target domain splits across 3 medical image datasets surpasses the leading method by 2.9% Dice coefficient on average.

Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for Single Image Test-Time Adaptation

TL;DR

This work tackles the challenging problem of single-image test-time adaptation for medical image segmentation under domain shift. It introduces InTEnt, a framework that ensembles predictions from multiple adapted models formed by varying batch normalization statistics between source and test domains, and weights them using foreground-background entropy balance (with optional entropy-sharpness weighting). By avoiding online parameter updates and instead integrating over BN-statistic-based models, InTEnt achieves state-of-the-art average Dice scores across 24 domain shifts (71.6% DSC) on three medical imaging datasets, outperforming existing SITTA methods and highlighting the critical role of BN statistics selection. The approach offers a practical, fast, and robust solution for real-world medical imaging where single-image adaptation is often necessary and labeling is scarce.

Abstract

Test-time adaptation (TTA) refers to adapting a trained model to a new domain during testing. Existing TTA techniques rely on having multiple test images from the same domain, yet this may be impractical in real-world applications such as medical imaging, where data acquisition is expensive and imaging conditions vary frequently. Here, we approach such a task, of adapting a medical image segmentation model with only a single unlabeled test image. Most TTA approaches, which directly minimize the entropy of predictions, fail to improve performance significantly in this setting, in which we also observe the choice of batch normalization (BN) layer statistics to be a highly important yet unstable factor due to only having a single test domain example. To overcome this, we propose to instead integrate over predictions made with various estimates of target domain statistics between the training and test statistics, weighted based on their entropy statistics. Our method, validated on 24 source/target domain splits across 3 medical image datasets surpasses the leading method by 2.9% Dice coefficient on average.
Paper Structure (21 sections, 12 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 21 sections, 12 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: Summary of our method for single-image test time adaptation of a segmentation model (Algorithm \ref{['alg:main']}). Note that segmentation probability map predictions $\hat{P}_k$ and $\hat{P}$ are rounded to binary masks for visualization.
  • Figure 2: Overview of the datasets used in this paper. Above each example image, we list its domain and the total number of images from this domain.
  • Figure 3: The effect on model prediction when using different domain batch norm. layer statistics.
  • Figure 4: Our method's performance dependence on the choice of ensembled adapted model count (Eq. 5, main paper) for the SC dataset, over difference domain shifts.