Table of Contents
Fetching ...

AMAES: Augmented Masked Autoencoder Pretraining on Public Brain MRI Data for 3D-Native Segmentation

Asbjørn Munk, Jakob Ambsdorf, Sebastian Llambias, Mads Nielsen

TL;DR

The paper tackles the lack of large public unlabeled data for 3D brain MRI segmentation by introducing BRAINS-45K, the largest public brain MRI collection to date for pretraining. It proposes AMAES, a memory-efficient augmentation-reversal masked autoencoder framework for 3D segmentation that uses a lightweight decoder and eschews pretraining skip connections, with pretraining on BRAINS-45K and finetuning on three downstream tasks. Results show that AMAES improves Dice scores across BraTS21, ISLES22, and WMH, and often surpasses SwinUNETR baselines, including in out-of-domain settings, while reducing memory and runtime. The work provides a practical, scalable pathway toward large-scale, self-supervised 3D medical segmentation and includes code and BRAINS-45K for reproducibility, enabling broader methodological research and clinical translation.

Abstract

This study investigates the impact of self-supervised pretraining of 3D semantic segmentation models on a large-scale, domain-specific dataset. We introduce BRAINS-45K, a dataset of 44,756 brain MRI volumes from public sources, the largest public dataset available, and revisit a number of design choices for pretraining modern segmentation architectures by simplifying and optimizing state-of-the-art methods, and combining them with a novel augmentation strategy. The resulting AMAES framework is based on masked-image-modeling and intensity-based augmentation reversal and balances memory usage, runtime, and finetuning performance. Using the popular U-Net and the recent MedNeXt architecture as backbones, we evaluate the effect of pretraining on three challenging downstream tasks, covering single-sequence, low-resource settings, and out-of-domain generalization. The results highlight that pretraining on the proposed dataset with AMAES significantly improves segmentation performance in the majority of evaluated cases, and that it is beneficial to pretrain the model with augmentations, despite pretraing on a large-scale dataset. Code and model checkpoints for reproducing results, as well as the BRAINS-45K dataset are available at \url{https://github.com/asbjrnmunk/amaes}.

AMAES: Augmented Masked Autoencoder Pretraining on Public Brain MRI Data for 3D-Native Segmentation

TL;DR

The paper tackles the lack of large public unlabeled data for 3D brain MRI segmentation by introducing BRAINS-45K, the largest public brain MRI collection to date for pretraining. It proposes AMAES, a memory-efficient augmentation-reversal masked autoencoder framework for 3D segmentation that uses a lightweight decoder and eschews pretraining skip connections, with pretraining on BRAINS-45K and finetuning on three downstream tasks. Results show that AMAES improves Dice scores across BraTS21, ISLES22, and WMH, and often surpasses SwinUNETR baselines, including in out-of-domain settings, while reducing memory and runtime. The work provides a practical, scalable pathway toward large-scale, self-supervised 3D medical segmentation and includes code and BRAINS-45K for reproducibility, enabling broader methodological research and clinical translation.

Abstract

This study investigates the impact of self-supervised pretraining of 3D semantic segmentation models on a large-scale, domain-specific dataset. We introduce BRAINS-45K, a dataset of 44,756 brain MRI volumes from public sources, the largest public dataset available, and revisit a number of design choices for pretraining modern segmentation architectures by simplifying and optimizing state-of-the-art methods, and combining them with a novel augmentation strategy. The resulting AMAES framework is based on masked-image-modeling and intensity-based augmentation reversal and balances memory usage, runtime, and finetuning performance. Using the popular U-Net and the recent MedNeXt architecture as backbones, we evaluate the effect of pretraining on three challenging downstream tasks, covering single-sequence, low-resource settings, and out-of-domain generalization. The results highlight that pretraining on the proposed dataset with AMAES significantly improves segmentation performance in the majority of evaluated cases, and that it is beneficial to pretrain the model with augmentations, despite pretraing on a large-scale dataset. Code and model checkpoints for reproducing results, as well as the BRAINS-45K dataset are available at \url{https://github.com/asbjrnmunk/amaes}.
Paper Structure (16 sections, 3 figures, 6 tables)

This paper contains 16 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: AMAES provides efficient 3D pretraining for segmentation networks requiring less resources than SwinUNETR while improving on downstream performance. Downstream performance is on the BraTS21 dataset, see Section \ref{['sec:results']}. The MedNeXt model is MedNeXt L (55 mio. parameters), the U-Net is U-Net XL (90 mio. parameters). SwinUNETR has 60 mio. parameters. Memory usage is recorded with a batch size of two for all models. All results were obtained using Nvidia H100 GPUs and with mixed 16-bit precision using uncompiled models.
  • Figure 2: Graphical overview of the AMAES framework. During pretraining, spatial and intensity-based augmentations are applied to an image patch. The patch is masked and passed through the model, which consists of a backbone encoder and a lightweight decoder, to reconstruct the image. The reconstruction target is the unmasked image, with only spatial transformations applied. During finetuning, only spatial augmentations are applied to the input. The backbone encoder weights are transferred, while a new U-Net decoder is initialized. Skip-connections are only used during finetuning.
  • Figure 3: Rotation and contrastive losses when training SwinUNETR tang2022swinunetrpret on BRAINS-45K.