Table of Contents
Fetching ...

Learnable Weight Initialization for Volumetric Medical Image Segmentation

Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

TL;DR

The paper tackles the challenge of data scarcity and initialization-induced variance in hybrid volumetric medical image segmentation. It proposes a learnable, data-dependent weight initialization learned through a two-stage process: Step 1 self-supervised pretraining using a Transformation Module that performs depth-wise rearrangement, sub-volume partitioning, shuffling, and masking to induce structural and contextual priors; Step 2 supervised segmentation training initialized by Step 1. Experiments on Synapse multi-organ CT and MSD Lung show consistent Dice improvements and strong statistical significance, with state-of-the-art networks benefiting from the data-dependent initialization. The approach yields competitive results with less data and avoids external datasets, offering a practical, architecture-agnostic enhancement for volumetric segmentation.

Abstract

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention. While mainly focusing on architectural modifications, most existing hybrid approaches still use conventional data-independent weight initialization schemes which restrict their performance due to ignoring the inherent volumetric nature of the medical data. To address this issue, we propose a learnable weight initialization approach that utilizes the available medical training data to effectively learn the contextual and structural cues via the proposed self-supervised objectives. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach, leading to state-of-the-art segmentation performance. Our proposed data-dependent initialization approach performs favorably as compared to the Swin-UNETR model pretrained using large-scale datasets on multi-organ segmentation task. Our source code and models are available at: https://github.com/ShahinaKK/LWI-VMS.

Learnable Weight Initialization for Volumetric Medical Image Segmentation

TL;DR

The paper tackles the challenge of data scarcity and initialization-induced variance in hybrid volumetric medical image segmentation. It proposes a learnable, data-dependent weight initialization learned through a two-stage process: Step 1 self-supervised pretraining using a Transformation Module that performs depth-wise rearrangement, sub-volume partitioning, shuffling, and masking to induce structural and contextual priors; Step 2 supervised segmentation training initialized by Step 1. Experiments on Synapse multi-organ CT and MSD Lung show consistent Dice improvements and strong statistical significance, with state-of-the-art networks benefiting from the data-dependent initialization. The approach yields competitive results with less data and avoids external datasets, offering a practical, architecture-agnostic enhancement for volumetric segmentation.

Abstract

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention. While mainly focusing on architectural modifications, most existing hybrid approaches still use conventional data-independent weight initialization schemes which restrict their performance due to ignoring the inherent volumetric nature of the medical data. To address this issue, we propose a learnable weight initialization approach that utilizes the available medical training data to effectively learn the contextual and structural cues via the proposed self-supervised objectives. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach, leading to state-of-the-art segmentation performance. Our proposed data-dependent initialization approach performs favorably as compared to the Swin-UNETR model pretrained using large-scale datasets on multi-organ segmentation task. Our source code and models are available at: https://github.com/ShahinaKK/LWI-VMS.
Paper Structure (17 sections, 11 equations, 7 figures, 9 tables)

This paper contains 17 sections, 11 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Left: UNETR UNETR is sensitive to different data-independent weight initialization schemes. We observe that UNETR performance drops significantly when initialized with the Kaiming normal method. Further, the truncated normal method gives better results than the default UNETR initialization. Right: Qualitative comparison on Synapse dataset results between the default and our proposed initialization (Init) method within the same UNETR framework. We enlarge the segmented area (green dashed boxes in column 1). Our method reduces the false positives for organs compared to standard UNETR (red dashed box in column 2). Organs are shown in the legend below the examples. Best Viewed zoomed in.
  • Figure 2: Overview of our proposed approach: To learn weight initialization using self-supervised tasks defined by the volumetric nature of the medical data. In the early stage of training (Step-1), we define the order prediction task within the encoder latent space, while simultaneously the decoder has to reconstruct the missing organs from masked $\&$ shuffled input. The masked $\&$ shuffled input is the result of our transformation module with 4 stages : depth-wise rearranging, partitioning into equal size sub-volumes, random shuffling of sub-volumes for the order prediction objective, and finally masking shuffled volume for the reconstruction objective. This allows the model to learn structural and contextual consistency about the data that provides an effective initialization for the segmentation task (Step-2). Our approach does not rely on any extra data and therefore remains as computationally effective as the baseline while enhancing the segmentation performance.
  • Figure 3: Qualitative results for Synapse dataset on SOTA segmentation networks: The proposed data-dependent initialization scheme, when integrated with different segmentation networks, improves the overall segmentation performance by accurately segmenting the organs and delineating organ boundaries. Organs are shown in the legend below the example images. Abbreviations are as follows: Spl: spleen, RKid: right kidney, LKid: left kidney, Gal: gallbladder, Eso: esophagus, Liv: liver, Sto: stomach, Aor: aorta, IVC: inferior vena cava, PSV: portal and splenic veins, Pan: pancreas, RAG: right adrenal gland, and LAG: left adrenal gland. Best Viewed zoomed in.
  • Figure 4: Effect of masking ratio (a) and mask patch size (b): Moderate masking with masking ratio around 40% and mask patch size of ($16 \times 16 \times 16$) during the initialization step (step-1) yields the optimal results for UNETR on synapse dataset. Effect of increasing the training epochs for initialization (c): Training on our proposed approach on initialization for longer epochs improves the overall segmentation performance.
  • Figure 5: Qualitative results (Lung) on UNETR: Columns 2-4 show the enlarged views of the segmented areas marked in a green box in column 1. Integrating our proposed learnable initialization approach is beneficial in learning the structural and contextual cues from the training data, which helps in reducing the cases of miss classification (false negatives (row 1) and false positives (row 2)).
  • ...and 2 more figures