Table of Contents
Fetching ...

Domain Aware Multi-Task Pretraining of 3D Swin Transformer for T1-weighted Brain MRI

Jonghun Kim, Mansu Kim, Hyunjin Park

TL;DR

This work proposes novel domain-aware multi-task learning tasks to pretrain a 3D Swin Transformer for brain magnetic resonance imaging (MRI) and outperforms existing supervised and self-supervised methods in three downstream tasks of Alzheimer's disease classification, Parkinson's disease classification, and age prediction tasks.

Abstract

The scarcity of annotated medical images is a major bottleneck in developing learning models for medical image analysis. Hence, recent studies have focused on pretrained models with fewer annotation requirements that can be fine-tuned for various downstream tasks. However, existing approaches are mainly 3D adaptions of 2D approaches ill-suited for 3D medical imaging data. Motivated by this gap, we propose novel domain-aware multi-task learning tasks to pretrain a 3D Swin Transformer for brain magnetic resonance imaging (MRI). Our method considers the domain knowledge in brain MRI by incorporating brain anatomy and morphology as well as standard pretext tasks adapted for 3D imaging in a contrastive learning setting. We pretrain our model using large-scale brain MRI data of 13,687 samples spanning several large-scale databases. Our method outperforms existing supervised and self-supervised methods in three downstream tasks of Alzheimer's disease classification, Parkinson's disease classification, and age prediction tasks. The ablation study of the proposed pretext tasks shows the effectiveness of our pretext tasks.

Domain Aware Multi-Task Pretraining of 3D Swin Transformer for T1-weighted Brain MRI

TL;DR

This work proposes novel domain-aware multi-task learning tasks to pretrain a 3D Swin Transformer for brain magnetic resonance imaging (MRI) and outperforms existing supervised and self-supervised methods in three downstream tasks of Alzheimer's disease classification, Parkinson's disease classification, and age prediction tasks.

Abstract

The scarcity of annotated medical images is a major bottleneck in developing learning models for medical image analysis. Hence, recent studies have focused on pretrained models with fewer annotation requirements that can be fine-tuned for various downstream tasks. However, existing approaches are mainly 3D adaptions of 2D approaches ill-suited for 3D medical imaging data. Motivated by this gap, we propose novel domain-aware multi-task learning tasks to pretrain a 3D Swin Transformer for brain magnetic resonance imaging (MRI). Our method considers the domain knowledge in brain MRI by incorporating brain anatomy and morphology as well as standard pretext tasks adapted for 3D imaging in a contrastive learning setting. We pretrain our model using large-scale brain MRI data of 13,687 samples spanning several large-scale databases. Our method outperforms existing supervised and self-supervised methods in three downstream tasks of Alzheimer's disease classification, Parkinson's disease classification, and age prediction tasks. The ablation study of the proposed pretext tasks shows the effectiveness of our pretext tasks.
Paper Structure (22 sections, 9 equations, 11 figures, 5 tables)

This paper contains 22 sections, 9 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Overview of our proposed multi-task pretraining framework. The original MR image is divided into global and local views. Augmentation is then performed by applying masking and rotation, followed by feeding into the Swin Transformer. The process shows that the encoder learns features through seven pretext tasks.
  • Figure 2: Detailed illustration of each pretext task in our proposed approach. (a) Brain Anatomy: predicting the parcellation of the input brain image. (b) Brain Morphology: predicting morphology, such as thickness or curvature, of the input brain image. (c) Radiomics Texture Prediction: predicting radiomics texture in the white matter, gray matter, and CSF regions. (d) Patch Location: identifying the position of the patch in the local view. (e) Image Rotation: rotating the original image and determining the corresponding rotation. (f) Masked Image Modeling: the original image is cut out and reconstructed back to its original form. (g) Contrastive Learning: different augmentations applied to the same patch are pulled closer as positive pairs and inputs from different images are pushed away as negative pairs.
  • Figure 3: Illustration of brain parcellation and morphology in sagittal view of MRI. Left plot showcases 120 regions of brain parcellation using the Desikan Atlas. Right plot represents the thickness and curvature of brain morphology.
  • Figure 4: The comparison of downstream tasks performance with varying pretext tasks for pretraining. The average accuracy (top) and AUC (bottom) for five-fold cross-validation are reported in each box plot.
  • Figure 5: The AUC graphs of the scratch model and pretrained model of the Swin Transformer according to the percentage of labeled data for the AD classification.
  • ...and 6 more figures