Table of Contents
Fetching ...

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking

Yuheng Li, Tianyu Luan, Yizhou Wu, Shaoyan Pan, Yenho Chen, Xiaofeng Yang

TL;DR

AnatoMask tackles the data-efficiency bottleneck in 3D medical image segmentation by introducing reconstruction-guided self-masking within a self-distillation framework. The method uses a teacher–student setup with an EMA-updated teacher to identify anatomically significant regions via reconstruction losses and to progressively increase masking difficulty through an easy-to-hard masking schedule, all within a hierarchical encoder–decoder backbone. Empirical results on TotalSegmentator and transfer datasets (FLARE22, AMOS22, AutoPETII) show improved pretraining efficiency and superior segmentation performance over prior SSL methods, with scalability observed on larger backbones. This approach promises data-efficient, anatomy-aware pretraining for medical imaging across modalities, and the authors provide code to facilitate adoption.

Abstract

Due to the scarcity of labeled data, self-supervised learning (SSL) has gained much attention in 3D medical image segmentation, by extracting semantic representations from unlabeled data. Among SSL strategies, Masked image modeling (MIM) has shown effectiveness by reconstructing randomly masked images to learn detailed representations. However, conventional MIM methods require extensive training data to achieve good performance, which still poses a challenge for medical imaging. Since random masking uniformly samples all regions within medical images, it may overlook crucial anatomical regions and thus degrade the pretraining efficiency. We propose AnatoMask, a novel MIM method that leverages reconstruction loss to dynamically identify and mask out anatomically significant regions to improve pretraining efficacy. AnatoMask takes a self-distillation approach, where the model learns both how to find more significant regions to mask and how to reconstruct these masked regions. To avoid suboptimal learning, Anatomask adjusts the pretraining difficulty progressively using a masking dynamics function. We have evaluated our method on 4 public datasets with multiple imaging modalities (CT, MRI, and PET). AnatoMask demonstrates superior performance and scalability compared to existing SSL methods. The code is available at https://github.com/ricklisz/AnatoMask.

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking

TL;DR

AnatoMask tackles the data-efficiency bottleneck in 3D medical image segmentation by introducing reconstruction-guided self-masking within a self-distillation framework. The method uses a teacher–student setup with an EMA-updated teacher to identify anatomically significant regions via reconstruction losses and to progressively increase masking difficulty through an easy-to-hard masking schedule, all within a hierarchical encoder–decoder backbone. Empirical results on TotalSegmentator and transfer datasets (FLARE22, AMOS22, AutoPETII) show improved pretraining efficiency and superior segmentation performance over prior SSL methods, with scalability observed on larger backbones. This approach promises data-efficient, anatomy-aware pretraining for medical imaging across modalities, and the authors provide code to facilitate adoption.

Abstract

Due to the scarcity of labeled data, self-supervised learning (SSL) has gained much attention in 3D medical image segmentation, by extracting semantic representations from unlabeled data. Among SSL strategies, Masked image modeling (MIM) has shown effectiveness by reconstructing randomly masked images to learn detailed representations. However, conventional MIM methods require extensive training data to achieve good performance, which still poses a challenge for medical imaging. Since random masking uniformly samples all regions within medical images, it may overlook crucial anatomical regions and thus degrade the pretraining efficiency. We propose AnatoMask, a novel MIM method that leverages reconstruction loss to dynamically identify and mask out anatomically significant regions to improve pretraining efficacy. AnatoMask takes a self-distillation approach, where the model learns both how to find more significant regions to mask and how to reconstruct these masked regions. To avoid suboptimal learning, Anatomask adjusts the pretraining difficulty progressively using a masking dynamics function. We have evaluated our method on 4 public datasets with multiple imaging modalities (CT, MRI, and PET). AnatoMask demonstrates superior performance and scalability compared to existing SSL methods. The code is available at https://github.com/ricklisz/AnatoMask.
Paper Structure (12 sections, 3 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 12 sections, 3 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of random masking (SparK) vs AnatoMask. We visualize: a). input with anatomical ground truth (GT); b). reconstruction losses obtained by averaging over 2 random masks; c). a random mask ; d). our AnatoMask generated from b. In e), we also compared the training efficiency of AnatoMask and SparK. Transparent areas indicate unmasked regions.
  • Figure 2: Overview of proposed AnatoMask. During SSL pretraining at epoch $t$, the teacher network receives randomly masked inputs and computes the patch-level reconstruction loss $L_{rec}^t$. The top $r_t$ regions with the highest reconstruction losses are selected to form a binary mask $M_{top}^t$. Then, the remaining $(1-r_t)\gamma$ areas are randomly filled with binary values to form our final mask $M_{\textit{final}}^t$. The student network is trained to reconstruct input masked by $M_{\textit{final}}^t$.
  • Figure 3: Visual comparison between segmentation ground truths and reconstruction losses. For each row, we show a). an image with organ ground truths, b). reconstruction losses by averaging over 2 masks, c). a random mask (60%), and d). our AnatoMask obtained from b). Red means higher loss values while blue indicates lower ones. Red areas tend to overlap with organ regions. Transparent areas indicate unmasked regions.
  • Figure 4: Visualization of multi-organ CT segmentation results on TotalSegmentator. Yellow arrows indicate improvements in segmentation.
  • Figure 5: Visualization of multi-organ CT segmentation results on FLARE22. Yellow arrows indicate improvements in segmentation.
  • ...and 2 more figures